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ABSTRACT 

We propose a robust, quantitative method to compare the synthetic light curves of a Type la Supemova (SNIa) 
explosion model with a large set of observed SNela, and derive a figure of merit for the explosion model's 
agreement with observations. The synthetic light curves are fit with the data-driven model SALT2 which 
returns values for stretch, color, and magnitude at peak brightness, as well as a goodness-of-fit parameter. Each 
fit is performed multiple times with different choices of filter bands and epoch range in order to quantify the 
systematic uncertainty on the fitted parameters. We use a parametric population model for the distribution of 
observed SNIa parameters from large surveys, and extend it to represent red, dim, and bright outliers found 
in a low-redshift SNIa data set. We discuss the potential uncertainties of this population model and find it 
to be reliable given the current uncertainties on cosmological parameters. Using our population model, we 
assign each set of fitted parameters a likelihood of being observed in nature, and a figure of merit based on this 
Ukelihood. We define a second figure of merit based on the quality of the light curve fit, and combine the two 
measures into an overall figure of merit for each explosion model. We compute figures of merit for a variety of 
ID, 2D and 3D explosion models and show that our evaluation method allows meaningful inferences across a 
wide range of fight curve quality and fitted parameters. 
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1. INTRODUCTION 

Due to their value as cosmological distance indicators 
(Riess et al. 1998; Perlmutter et al. 1999), Type la Supemovae 
(hereafter SNela) have received intense observational atten- 
tion, with large-scale SN surveys exponentially increasing the 
number of observed SNIa events (see, e.g., Goobar & Lei- 
bundgut 2011, for a review). This trend will continue over the 
next decade with even larger data sets from the Dark Energy 
Survey (DES, Bernstein et al. 2012), Pan-STARRS (Tonry 
et al. 2012) and the Large Synoptic Survey Telescope (LSST, 
Ivezic et al. 2008). These surveys also undertake strong efforts 
to overcome calibration issues, which currently represent the 
largest source of uncertainty in using SNela to measure of 
the dark energy equation of state parameter w (Conley et al. 
2011). As calibration uncertainties are reduced, astrophys- 
ical uncertainties become a larger fraction of the total error 
budget, and a theoretical understanding of SNela may help to 
reduce those uncertainties. For example, after a decade of ob- 
servational efforts to reduce the intrinsic Hubble scatter, only 
modest progress has been made (see, e.g., Silverman et al. 
2012, for a thorough analysis). Thus, we will need better the- 
oretical models of SNela to understand the nature of intrinsic 
scatter, and other systematics such as the correlation with host 
galaxy environment (Kelly et al. 2010; Lampeitl et al. 2010; 
Sullivan et al. 2011). A theoretical framework might lead to 
a physically motivated parametrization of these effects with 
measurable parameters. 
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Unfortunately, current theoretical models are not yet able 
to investigate subtle systematic effects like these. While the 
general consensus is that SNela arise from the thermonuclear 
explosion of a white dwarf in a binary system (Whelan & 
Iben 1973), it is still unclear whether the companion star is a 
main sequence star or red giant (single-degenerate scenario), 
or another white dwarf (double-degenerate scenario, Iben & 
Tutukov 1984). Recent observations disfavor the red giant 
scenario (Li et al. 2011; Brown et al. 2012), but the ques- 
tion is far from settled (see, e.g., Maoz & Mannucci 2012, 
for a review). Even within the single-degenerate paradigm 
there is still no consensus as to the exact explosion mecha- 
nism (see, e.g., Hillebrandt & Niemeyer 2000, for a review). 
The deflagration-to-detonation model (DDT, or delayed det- 
onation, DD, e.g. Khokhlov 1991; Niemeyer & Hillebrandt 
1995; Niemeyer 1999; Kasen et al. 2009; Krueger et al. 2012; 
Seitenzahl et al. 2012) has emerged as the most popular 
model, but the exact mechanism by which a detonation is ini- 
tiated is still unknown (Ropke 2007; Woosley 2007; Woosley 
et al. 2009). Furthermore, the field has only recently moved 
on to three-dimensional simulations, and results suggest that, 
due to the effects of turbulence, lower-dimensional results are 
not reliable (Ropke & Niemeyer 2007). Another surprising 
result of multi-dimensional simulations was the discovery of 
an alternative explosion mechanism, the gravitationally con- 
fined detonation (GCD, Plewa et al. 2004; Townsley et al. 
2007; Jordan et al. 2008; Meakin et al. 2009; Jordan et al. 
2012b). Besides the DDT and GCD, a variety of other mod- 
els has been investigated, such as the double-degenerate sce- 
nario (Dan et al. 2011, 2012; Pakmor et al. 2012), double- 
detonation and sub-Chandrasekhar models (Livne & Arnett 
1995; Shen et al. 2010; Kromer et al. 2010), and failed deto- 
nation models (Jordan et al. 2012a; Kromer et al. 2012). 

With such a multitude of possible explosion models at hand, 
we need to use observed SNela to discern between promising 
and invaUd models. The main difficulty in performing such 
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validation is that SNIa light curves and spectra are quite di- 
verse, meaning that there is no master light curve or spectral 
template to compare against explosion models (Hoeflich et al. 
1996; Nugent et al. 2002; Benetti et al. 2005; Branch et al. 
2006; Hayden et al. 2010; Blondin et al. 2012). Generally, two 
methods of comparing theoretical models and observations 
have been employed: visual comparisons of light curves and 
spectra, and comparisons of characteristic magnitudes, col- 
ors and decline rates. Visual comparisons entail overplotting 
spectra or light curves of the explosion model in question and 
one or a few observed events. The obvious advantage is sim- 
plicity, but 1) one or a few observed events are not representa- 
tive of SNela as a whole, 2) the results of visual comparisons 
are subjective, and 3) the human eye easily misses details, 
especially when plotting on a logarithmic scale. An alterna- 
tive method is to reduce the high-dimensional space of light 
curves to a few characteristic quantities such as peak mag- 
nitude, Aoti5 (Phillips 1993) and B - V color. For example, 
Khokhlov et al. (1993) and Livne & Arnett (1995) used the 
peak magnitudes and rise times of their explosion model light 
curves to compare to observed data, and find these quantities 
to discern between favored and disfavored models. Over time, 
modelers extended such comparisons to quantities describing 
the decline rate, such as Amv20 or ^w^is (see, e.g., Hoeflich 
et al. 1996; Hoeflich & Khokhlov 1996) and used multi-color 
light curves as well as spectra (see, e.g., Kasen & Plewa 2007; 
Blondin et al. 201 1, 2013). Despite these efforts, robust meth- 
ods to evaluate models remain elusive. Ropke et al. (2012) at- 
tempted to discem between a double-degenerate model from 
Pakmor et al. (2012) and a DDT model using the recently ob- 
served SN 201 Ife (Nugent et al. 201 1). Despite the very dif- 
ferent explosion mechanisms, they concluded it was diflicult 
to verify or falsify either model. 

The main weakness of many of the aforementioned com- 
parison methods is that they employ only a small subset of 
the available data, either particular SNela which may or may 
not be representative of the overall sample, or quantities such 
as peak magnitudes and Amis which rely on one or a few 
measurements in a particular filter band. Besides the some- 
what arbitrary definition of those quantities, much attention 
is often given to the width-luminosity relation (Phillips re- 
lation, Phillips 1993; Phillips et al. 1999; Kasen & Woosley 
2007), though the color-magnitude relation is equally impor- 
tant for standardizing peak magnitudes, and hence for cosmol- 
ogy (Riess et al. 1996b; Tripp 1998; Wang et al. 2003). Never- 
theless, the comparison methods described above were appro- 
priate when only a few SNela had been observed with good 
accuracy. With modern data sets in hand, however, we are 
now in a position to compare theoretical models with a well- 
understood set of observed SNela. The observational com- 
munity uses data-driven models such as SALT2 (Guy et al. 
2007), SiFTO (Conley et al. 2008), mlcs2k2 (Jha et al. 2007) 
and SNooPy (Burns et al. 2011) to fit observed light curves 
and extract a few meaningful parameters which summarize 
the properties of the light curves in question. The results of 
those fits, such as stretch, color and magnitude, are used to 
derive distance moduli for a Hubble diagram. We thus expect 
that these fitting techniques should also work well for com- 
paring theoretical light curves to observations. 

Blondin et al. (201 1) applied data-driven models to the syn- 
thetic light curves of Kasen et al. (2009), but concluded that 
neither mlcs2k2, SALT2 nor SNooPy fit the light curves to 
sufficient accuracy, and thus focus on rise and decline times, 
color evolution and spectral comparisons. In this paper, we 



propose a somewhat different approach. We define a quanti- 
tative measure of light curve quality for all explosion models, 
including those that are not well fit by data-driven models. 
Though we appreciate the value of spectral comparisons, we 
focus on broadband quantities. By virtue of averaging over 
the range of wavelengths in a filter band, the light curves of 
explosion models are less sensitive to the details of the radia- 
tive transport treatment than the spectral features are. Sec- 
ondly, the properties of observed light curves have already 
been captured in data-driven models which allow us to infer 
characteristic quantities for a given explosion model. Spec- 
tral comparison codes such as SNID (Blondin & Tonry 2011) 
quantify how well a spectrum matches a set of templates, but 
do not result in any summarizing quantities such as color and 
stretch. 

Our goal is to design a comparison method which satisfies 
three main criteria, namely 1) avoids arbitrarily picking a sub- 
set of the data such as particular SNela, 2) uses quantities 
which reflect the entire light curve rather than a few particu- 
lar data points such as the epoch of peak brightness, and 3) 
results in a well-defined figure of merit, allowing us to rank 
explosion models by how well they reproduce observations. 
Our strategy is to fit the explosion model light curves with 
SALT2 and compare the fitted stretch, color and peak mag- 
nitude to the measured population of normal SNela, ignoring 
peculiar events.' This procedure returns two separate indica- 
tors of Ught curve quaUty, namely how well the Ught curves 
in question can be fit with SALT2, and how likely the fitted 
stretch, color and peak magnitude are to be observed in na- 
ture. These two indicators are combined into an overall figure 
of merit. We emphasize that this figure of merit should be 
seen as a relative measure of comparison between a set of 
explosion models rather than an absolute measure of quality, 
chiefly because any changes in the SALT2 model or its imcer- 
tainties (e.g. through re-training), or any changes in the exact 
fitting procedure, will lead to different results. 

The details of our method are described in Section 2. In 
Section 3, we give the results of applying our fitting proce- 
dure to a range of explosion models. We discuss potential 
shortcomings of our method in Section 4, and summarize our 
results in Section 5. 

2. METHOD 

The goal of our analysis is to derive a figure of merit indi- 
cating how likely it is that a given explosion model represents 
observed SNela. Figure 1 gives an overview of the different 
stages of this procedure. In Section 2.1 we discuss the hy- 
drodynamic simulations and radiative transfer codes used in 
this analysis, as well as the conversion from spectral energy 
distributions (SEDs) to light curves. In Section 2.2 we in- 
troduce data-driven models, and summarize the most impor- 
tant features of the SALT2 model. In Section 2.3 we describe 
our method for fitting explosion model light curves with the 
SALT2 model, and discuss how we derive a goodness-of- 
fit indicator as well as fitted parameters. In Section 2.4 
we describe the population model of SNela in stretch-color- 
magnitude space. In Section 2.5 we define figures of merit for 
goodness-of-fit and fitted parameters, and combine them into 
an overall figure of merit. 

' Peculiar SNela include events such as 2002cx (Wood-Vasey et al. 2002; 
Li et al. 2003), 2005gj (Aldering et al. 2006; Prieto et al. 2007) and 2005hk 
(Chomock et al. 2006; Phillips et al. 2007). See also Foley et al. (2012) and 
references therein. 
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Figure 1. Flow chart representation of our procedure for comparing ex- 
plosion models to observed SNela. Blue boxes represent purely modeling- 
related steps, i.e. stages in the process where no comparison with observa- 
tions has taken place yet. Conversely, red boxes mark purely observational 
stages. Purple boxes represent stages where information from both simula- 
tions and observations has been combined. 



2.1. Explosion Models and Radiative Transfer 

We are not aiming to introduce new explosion models, or 
to discuss the details of hydrodynamic simulations, but to 
use light curves of various explosion models to test our eval- 
uation method. For this purpose, we pick four classes of 
models which we expect to provide interesting test cases for 
our method. Namely, we use the well-studied W7 model, 
the delayed-detonation models of Kasen et al. (2009, here- 
after KRW09), a recent suite of pure deflagration models 
(Long et al. 2013, in preparation), as well as a suite of phe- 
nomenological, pre-expanded models. These models repre- 
sent very different classes of explosions, and were computed 
in one dimension (ID, W7), two dimensions (2D, KRW09, 
pre-expanded) and three dimensions (3D, pure deflagration; 
light curves for the pure deflagration models were computed 
in ID). We briefly discuss these models below. 

The W7 model (Nomoto et al. 1984; Thielemann et al. 
1986) is a ID phenomenological model. The white dwarf 
does not detonate, but produces a bright explosion due to a 
fast, unphysical deflagration. Due to the simplicity of the W7 
model, it is not a physical model of the explosion. However, 
many radiation transport studies have found W7 to reproduce 
observed light curves and spectra surprisingly well (Jeffery 
et al. 1992; Hoflich 1995; Nugent et al. 1997; Baron et al. 
2006; Kasen et al. 2006; Kromer & Sim 2009; van Rossum 
2012). 

The KRW09 models are discussed in detail in the original 
paper, as well as in Blondin et al. (2011). The suite consists 
of 43 2D delayed-detonation (otherwise known as DDT) ex- 
plosion models. All models are based on the same progeni- 
tor model, but have different distributions of the initial flame 
bubbles, as well as different detonation criteria. The isotropic 
(iso) models, as well as one asymmetric model (asym_l), 
were given an isotropic distribution of between 20 and 150 



flame bubbles. For the other asymmetric models, between 15 
and 105 flame bubbles were distributed within a cone of a 
certain solid angle. The second distinguishing feature was the 
detonation criterion (dc), where each detonation criterion cor- 
responds to a certain range of densities, as well as a minimum 
Karlovitz number which has often been used as a criterion 
for detonation (see, e.g., Niemeyer & Woosley 1997). More 
details on the model setups are given in the supplement to 
KRW09. The spectral and light curve comparisons in Blondin 
et al. (2011) showed that some of the KRW09 models agree 
well with observations, while others are incompatible with 
observed spectra. 

Pure deflagration models rely on the initial deflagration to 
consume and unbind the white dwarf. Thus, they tend to 
burn less material to ^''Ni than detonating models, and pro- 
duce events which are dimmer than standard SNela. We use 
preliminary light curves of a recently completed suite of six 
pure deflagration models (Long et al. 2013, in preparation). 
These high-resolution, 3D simulations were computed using 
the FLASH code (Fryxell et al. 2000) and follow the defla- 
gration of a 1.365 Mq white dwarf The white dwarf was ig- 
nited with a certain number of ignition bubbles, distributed in 
a sphere centered at the origin. The models are labeled by 
the number of ignition bubbles (between 63 and 3500) as well 
as the radius of the sphere (128 km to 384 km), e.g. "Pure- 
Def_0063_128". The explosions produce between 0.28 and 
0.36 Mq of ''^Ni. We emphasize that the nucleosynthesis 
and radiation transport calculations for the pure deflagration 
models are preliminary, and might change once the full set of 
models is presented. 

Finally, we use three 2D, phenomenological, off-centered 
detonation models. These models were created by expand- 
ing an initially stable, Chandrasekhar-mass, white dwarf with 
a radial velocity. The total expansion energy was equivalent 
to 30%, 60% and 80% of the binding energy, respectively. 
We allow the white dwarf to expand, reach its maximum ex- 
pansion and contract. We trigger a detonation during the 
contracting phase of the WD by artificially creating a high- 
temperature region offset from the WD's center This scenario 
is designed to investigate strong asphericity, and thus strong 
variations of light curves with viewing angle. The simulations 
were computed with the FLASH code. In the rest of the paper, 
we refer to these models as "pre-expanded" models. 

Spectral energy distributions (SEDs) for each explosion 
model were computed using two different radiative trans- 
fer codes, SEDONA (KRW09 and pre-expanded models) and 
PHOENIX (W7 and pure deflagration models). 

SEDONA (Kasen et al. 2006) is a Monte-Carlo code which 
sends a large number of photon packets through the ejecta, 
and eventually bins them into an SED. The version of se- 
DONA used can handle two dimensions, and thus returns spec- 
tra as a function of viewing angle. Due to the nature of the 
Monte-Carlo method, the code returns Poisson uncertainties 
on the SEDs due to the finite number of photon packages 
used. However, when integrating the SEDs to obtain light 
curves, the number of photon packets in each broadband fil- 
ter is generally so large that the Poisson error is negligible. 
For all models with sedona radiative transport, 30 viewing an- 
gles were computed. The cosine of the viewing angles were 
evenly spaced, so that each viewing angle has equal chance of 
being observed. The viewing angles quoted refer to the mean 
cosine of the segment in solid angle, ranging from 15° (the 
south pole) to 165° (the north pole). 
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The second radiation transport code used was phoenix 
(Hauschildt & Baron 1999, 2004; van Rossum 2012). phoenix 
numerically solves the special relativistic radiative transfer 
equation using the efficient and highly accurate short charac- 
teristic and operator splitting methods. It samples millions of 
atomic lines individually, and solves for the two-way interac- 
tion between the radiation field and matter in a self-consistent 
manner (i.e., avoiding the use of the local thermodynamic 
equilibrium (LTE) approximation, and thus free scattering pa- 
rameters), phoenix solves for the time evolution using the ra- 
diation energy balance method. The code does not use the 
Sobolev approximation, diffusion approximations or opacity 
binning, phoenix operates in one dimension, assuming spher- 
ical symmetry. 

The SEDs from sedona start 2.5 days after explosion, with 
a cadence of one day, and in wavelength bins of 10 A. The 
phoenix SEDs start 4 days after explosion, with a cadence 
of two days and in about 1000 irregularly spaced wavelength 
bins with an average width of 25 A. As discussed in Section 
2.3, the Poisson uncertainty on the fluxes plays almost no role 
in the light curve fits and was set to a small number. 

Finally, we generate mock light curve observations of the 
explosion models by multiplying their SEDs with filter trans- 
mission functions representing either the Bessell filter sys- 
tem (UBVRI, Bessell 1990) or the SDSS filter system (ugriz, 
Fukugita et al. 1996). This operation is performed using 
the pubHcly available Supernova Analysis software (snana, 
Kessler et al. 2009). The mock observations are generated 
with a cadence of two days, and at a very low redshift of 
z = 0.002 so that K-corrections are unnecessary. 

2.2. The SALT2 Data-Driven Model 

With light curves for a given explosion model in hand, the 
simplest comparison with observations is to pick one or a few 
well-observed SNela, and overplot their light curves. How- 
ever, SNla light curves are heterogeneous, and picking one or 
a few is inevitably a biased representation of light curves as 
a whole. Fortunately, the observational community has de- 
veloped interpolation models, called "data-driven models", in 
order to standardize the brightness of each SNIa. They gen- 
erally constitute complex functions for the spectral evolution 
of SNela with a few free parameters which are determined 
from a least-squares fit to light curves. All other degrees 
of freedom (i.e. the exact shape of the epoch- wavelength 
surface) are fixed by training the model on large sets of ob- 
served events. Thus, data-driven models summarize informa- 
tion from many SNela, and allow a much more robust esti- 
mation of light curve characteristics than comparing to a few 
hand-picked SNela. A drawback of using data-driven models 
is that they are trained on normal SNela, meaning they do not 
capture peculiar SNIa events. 

For the purposes of this analysis, we choose the Spectral 
Adaptive Light curve Template method (SALT2, Guy et al. 
2005, 2007, 2010, hereafter G 10). SALT2 fits broadband pho- 
tometric light curves using a stretch-dependent SED for each 
epoch, and a color law. The model is agnostic as to the physi- 
cal mechanisms which cause decline rate and color variations 
in SNela. A plethora of other light curve models exists in 
the literature, almost all of which could be used instead of 
SALT2. For example, SiFTO (Conley et al. 2008) is similar 
to the SALT2 model, but uses the SED of Hsiao et al. (2007). 
The Multicolor Light Curve Shapes method (mlcs2k2, Riess 
et al. 1996a; Jha et al. 2007) is built on the underlying assump- 



tion that all color variations are due to dust reddening similar 
to that in our own galaxy. SNooPy (Bums et al. 2011) was 
especially designed for infrared (IR) light curve fitting, and 
could be used to extend our analysis into the infrared (given 
reliable IR light curves from explosion models). 

We choose the SALT2 model as it is publicly available and 
has been trained on the most recent SNLS3 sample (GIO). 
Since we use the model extensively in this analysis, we review 
its main features here. SALT2 models the flux as a function 
of wavelength, A, and time, t, such that 

Fit, A) = xo [Foit, A) + xiFiit, A)] X . (1) 

The spectral surfaces Fq and Fi, as well as the color law 
CL(A), are functions with thousands of parameters, all of 
which are fixed by "training" the model on observed data. The 
remaining parameters, xq, x\ and c, are derived from a least- 
squares fit to an individual SNIa event, where light curves in 
all filter bands are fit simultaneously. Since we refer to these 
quantities frequently, we shall briefly review their physical 
meaning. Though the fitted parameter xq is a flux normal- 
ization rather than a magnitude, we will generally refer to the 
corresponding magnitude 

me = -2.5 \o%^^(xq) - 10.095. (2) 

As evident from Equation (1), x\ is the coefficient of the sec- 
ond flux surface F\, raffier ffian a simple stretch quantity, s, 
in the sense of ffie original SALT model (t t/s). However, 
the main component of Fi does correspond to the difference 
of light curves with diflJ'erent stretch values, so that there are 
nearly linear relations between xi and s (Guy et al. 2007), 
as well as between xi and Aotij. Thus, we will refer to Xi as 
stretch. Lastly, c is the coefficient of the color law of Equation 
(1), and is virtually equal to B-V color (Kessler et al. 2013), 

B-V= 1.016c -I- 0.0008a:i -i- 0.0232. (3) 

Due to these close relationships with observable quantities, 
we shall speak of ffie SALT2 parameter space as a stretch- 
color-magnitude space. In the rest of this paper we refer to 
ffie stretch, color and magnitude derived from a light curve 
fit interchangeably as fitted parameters, fit results, or simply 
parameters. 

Though SALT2's spectral training covers wavelengths from 
2000 A to 9200 A, the central wavelengths of filter bands 
are not recommended to lie beyond 2800 A and 7000 A. We 
adhere to ffiis standard by using UBVR and ugr. The cen- 
tral wavelength for the next set of filters, I-band and i-band, 
lie beyond 7000 A. Extending filters past ffie recommended 
wavelength range would introduce uncertainties of a few per- 
cent which would be acceptable for the purpose of fitting ex- 
plosion models, since ffiey typically deviate from ffie SALT2 
light curves by a larger amount anyway. However, computing 
reliable radiative transfer results in the IR has traditionally 
been challenging because the results are highly sensitive to 
ffie ejecta temperature (see, e.g., Kasen 2006). Thus, we re- 
frain from including I-band in our comparisons. Our meffiod, 
however, can easily be extended to include ffie IR. 

So far, we have discussed the necessary background for per- 
forming light curve fits and deriving xi, c and m^. In Section 
2.4 we will also derive a population model for those parame- 
ters, and ffius be concerned wiffi the correlations between the 
populations in each parameter. SALT2 models ffie correlation 
between stretch and peak brightness wiffi a parameter a, and 
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the correlation between color and peak brightness with a pa- 
rameter yS. In addition, a magnitude offset Mo allows us to 
translate to absolute magnitudes (Guy et al. 2007). The three 
additional parameters are not derived from the light curve fit, 
but in a second, independent fit which, given a certain set of 
observed SNela, minimizes the scatter in the Hubble diagram. 
For each SNIa in the set, the distance modulus is estimated as 

A^est = m^-MQ + axi - ySc (4) 

where x\ , c and otb are derived from the light curve fit, and 
Mo, a and /J are determined by the Hubble diagram fit (see, 
e.g., Marriner et al. 2011). Thus, the results for Mo, a and ji 
depend on the sample of SNela used for the Hubble diagram 
fit, and their values may vary between surveys (see Section 
4.1). It is important to note that using SALT2 for our light 
curve fits does not force us to adopt the model of Equation (4) 
when describing the correlations between stretch, color and 
magnitude. It is, however, by far the most intuitive and well 
described way to use the results of SALT2 fight curve fits to 
describe those correlations, and we thus adopt it. 

For explosion models, we can simplify Equation (4) since 
we know the true distance modulus /i at which the mock ob- 
servations were generated. We translate the observed mag- 
nitude into an absolute magnitude, Mb = - //, giving an 
expression for the absolute magnitude we expect a SNIa to 
have based on its stretch and color, 

Mb = Mq - ax\ + pc . (5) 

We note that Mb represents the best-fit peak magnitude based 
on light curves in all filter bands, and is not the magnitude of 
the B-band at the epoch of peak brightness. Furthermore, the 
relation between Mb and otb, and thus Equation (5), are only 
valid at z = 0. Since we generate the mock light curves of our 
explosion models at a negligible redshift, this caveat does not 
pose a problem. We describe how we derive Mb for higher 
redshift data in Appendix A.4. 

2.3. Fitting Explosion Model Light Curves 

The goal of fitting simulated light curves to a data-driven 
model is to estimate the UkeUhood that the underlying explo- 
sion model could represent an observed SNIa. Accepting the 
paradigm that SNIa light curves are described by an analytical 
function with a few free parameters (xi, c and m^, in the case 
of SALT2), we split this likelihood into two separate compo- 
nents. First, the quality of the fit itself indicates whether the 
synthetic light curves represent a SNIa. Secondly, the result- 
ing best-fit values for xi, c and corresponds to a likelihood 
of observing a SNIa with those values. The second part will 
be discussed in Section 2.4, while this section deals with how 
the SALT2 model is used to fit explosion model fight curves 
rather than observed SNela. 

In the ordinary case of fitting observed data, a light curve 
fit refers to a least-squares fit of a set of given SNIa observa- 
tions to Equation (1), redshifted into the observer frame and 
multiplied by the filter bands used in the observations. We use 
the SNANA light curve fitting program to apply the same proce- 
dure to our mock observations of explosion models. However, 
there are some differences between explosion models and ob- 
served data which deserve further discussion before we de- 
scribe the more technical aspects of the fitting procedure. 

When we fit explosion model light curves, SALT2 takes 
on the role of "the model" while the simulated light curves 
serve as "the data". Of course, the simulated light curves do 



not carry the measurement uncertainties typically associated 
with SNIa observations. In order to follow the model-data ap- 
proach faithfully, we could use snana to simulate the survey 
conditions of, say, SDSS, and generate mock observations at a 
variety of redshifts, introduce reddening and atmospheric con- 
ditions, and propagate them through a virtual telescope, cam- 
era and processing pipeline. Such mock observations would 
then carry flux uncertainties similar to those of the survey 
which was simulated. There are, however, two reasons not 
to follow this procedure. First, we would inevitably introduce 
selection biases, and discard information about the explosion 
model light curves. Secondly, the simulated statistical uncer- 
tainties would carry no information about the systematic un- 
certainties of explosion models, which are much larger than 
the typical statistical uncertainties in the data. 

The systematic uncertainties can be spfit into two cate- 
gories, parametric and non-parametric errors. Parametric 
sources of error include the progenitor's mass and C/O abun- 
dance ratio, and the number, locations and sizes of the igni- 
tion bubbles. Non-parametric systematic errors are caused by 
missing or uncertain physics, for example in the flame model, 
the detonation model, the energy deposition of gamma rays, 
the ray tracing algorithm, or the number and properties of 
spectral lines in the radiation transfer calculation. More mun- 
dane sources of systematic errors include inadequate spatial 
and temporal resolution in the explosion or radiation trans- 
fer phase, an inadequate number or distribution of Lagrangian 
tracer particles used to determine the abundance of elements, 
or an inadequate number of rays, spectral bins, or angular 
bins in the radiation transfer phase. Unless the simulated sur- 
vey conditions are extremely poor, a combination of all those 
sources of systematic uncertainty far outweighs the statistical 
errors. 

As systematic errors typically dominate over statistical un- 
certainties, and because we wish to avoid discarding light 
curve information by simulating survey conditions, we choose 
to ignore statistical uncertainties entirely. Instead, we place 
the simulated SNela at a low redshift and set their statistical 
uncertainties to a small number, ensuring that the uncertain- 
ties on the SALT2 model dominate the error term in the 
computation. This strategy has two significant consequences. 
First, the;f^ per degree of freedom (x^/Ndot) of the light curve 
fits is generally much greater than for observed data. Sec- 
ondly, we cannot interpret the results of the SALT2 fits in a 
statistically meaningful fashion. For example, while 
from the fit to a set of observed SNIa light curves is ordinar- 
ily interpretable as the log-likelihood of the event having been 
a SNIa, we do not assign such interpretation to an explosion 
model. 

While the lack of rigorous error treatment may be a sober- 
ing realization, we are not trying to compute the likelihood of 
a certain model representing a SNIa, but rather we are solving 
an optimization problem. We are trying to identify which ex- 
plosion models, or which classes of explosion models, come 
closest to reproducing the light curves of real SNela. Our 
methodological approach can yield information about the val- 
ues of the model parameters that are favored and disfavored 
given the current sample of observed SNela. It can also pro- 
vide clues as to non-parametric sources of systematic error. 
For these purposes, we do not need to derive an absolute like- 
lihood. Thus, we shall avoid quoting quantities which are 
usually assigned statistical meaning, such asx^lN^of, and in- 
stead focus on indicators of fight curve quaUty which can be 
computed without invoking uncertainties on the fight curves 
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Figure 2. The influence of filter band choice on SALT2 fit results. Each of the left thi'ee columns represents a fit using different filter bands. The choices of filters 
and epoch range, as well as the fit results, are indicated in the top panels. The black curves show the l()7°viewing angle of the KRW09_asym_6_dc_2 model. 
All fits include epochs between - 10 and +20 days, with the red dashed parts of the light curve excluded. The excluded epoch range can shift by a few days, as 
the fitted epoch of B-band peak brightness can differ froin that of the explosion model light curves. The dashed blue curves show the SALT2 fits, with sfiaded 
blue regions indicating the Icr confidence regions of the SALT2 model. The three columns correspond to three difl'erent choices of filters (UBVR, BV and ugr), 
demonstrating that this choice can influence the fit results, particularly when the fit is not very good as seen in the left column. In the UBVR fit, the SALT2 model 
cannot accommodate all magnitudes simultaneously. In the BV fit, the peak magnitudes of the B and V bands can be fit, but this results in a significantly higher 
value for c (redder color). The ugr fit, however, results in a slightly lower c value than the UBVR fit. Right column: UBVR fits to the observed SN 2002bo (CfA3 
data, see Appendix A. 2). Even though epochs before -10 and after +20 days are excluded from the fit, the fit is compatible with the data at all epochs. 
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Figure 3. Same as Figure 2, but for different fitted epoch ranges. The light curves show the 92°viewing angle of the KRW09Jso_l_dc_5 inodel. In the left 
column, only epochs from - 10 to +20 days from B-band peak brightness were included in the fit, as indicated by the dashed red, excluded regions. In the second 
column, early epochs were included as well (-20 to +20 days), significantly degrading the fit as the SALT2 model cannot accommodate the shallow rising slope 
of the light curves. Including the late epochs (third column) makes very little difference for this particular model. The right column demonstrates that the choice 
of filter bands makes little difference in this case, since the ugr fit from -20 to +20 days results in values for x\, c and ma which are very similar to the UBVR 
fit. Note that the main reason the values are different is that the light curve fit is poor in the first place, at least when including the entire epoch range. 
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themselves. 

Having decided to forego a statistical interpretation of the 
Ught curve fits, we face another issue which causes systematic 
uncertainty in the fit results. We need to make certain choices 
regarding the light curve fits, two of which turn out to influ- 
ence the results significantly: the choice of filter bands, and 
the epoch range included in the fit. Figure 2 demonstrates the 
impact of the first choice. Adding or omitting filter bands can 
change the fit results, as shown by the first two columns in 
Figure 2. While fitting the entire UBVR set analyzes a larger 
wavelength range, fits to a subset may result in more accurate 
fits to those bands. For example, fits to BV only (center col- 
umn of Figure 2) might lead to a better estimate of B-V color 
or B-band peak brightness. Furthermore, there is no clear rea- 
son to prefer, for example, the Bessell filter system (UBVR) 
over the SDSS system (ugr), but they result in slightly difi'er- 
ent fitted parameters (third column of Figure 2). Especially 
the color parameter, c, is aff'ected by the choice of filters, since 
it correlates closely with B-V color (Equation (3)). 

Secondly, the fit results can depend on the choice of fit- 
ted epoch range, particularly when the Ught curve fit is poor, 
as demonstrated in Figure 3. A simple argument could be 
made that one should always include as many epochs as pos- 
sible, if the radiative transfer code provides them. In prac- 
tice, however, this is not the best choice. First, the early and 
late epochs can contribute a disproportionate amount of^^ per 
epoch. This emphasis appears when an explosion model fails 
to match the rising or falling slope, since the flux uncertainties 
on the SALT2 model are smaU at early and late times (though 
the magnitude errors are quite large). While such disagree- 
ment highlights a real problem with the explosion model, it 
can drive the fit away from the best fit around peak. In princi- 
ple, an explosion model should of course be able to reproduce 
observed Ught curves at aU epochs, but almost no explosion 
model to date has reached this level of agreement with data. 
Secondly, one might be interested in the performance of a par- 
ticular explosion model around peak, not at early or late times. 
Third, when comparing explosion models we do not wish to 
penalize a model for going out further in time than others. 
Such a penalty would inevitably occur if the fitted epoch 
range was varied between models. 

The biasing of fit results due to our choices of filter bands 
and epoch range is expected, since those choices simply cor- 
respond to weighting certain regions of the epoch-wavelength 
surfaces stronger than others. For explosion models which 
perfectly fit the SALT2 Ught curves, however, we expect no 
such bias. We tested this assumption by generating ideal light 
curves from the epoch-wavelength surfaces of the SALT2 
model, and then fitting them with SALT2. As expected, fits 
to these ideal light curves reproduced the input magnitude, 
stretch and color to better than 0.1% error, regardless of which 
epoch range or filter bands were fit. As a second check, we 
tested the impact of epoch range on the fitted stretch param- 
eter when fitting observed data. Since data are measured in 
one particular filter system (e.g. UBVRI), we cannot test the 
impact of using other filter systems (e.g. ugriz). We chose 
two small sets of events from the CfA3 survey (Appendix 
A.2) which have data points before -10 and after -1-20 days. 
The first set was chosen to contain only very good fits with 
~ I, while the second set consisted of SNela with rela- 
tively poor fits, > 1, such as the sub-luminous 2005hk. 
All events except 2005hk were part of the SALT2 training set 
(GIO). When fitted with different epoch ranges and subsets 
of filters, the fitted parameters for the set with good fits show 



very little variation (typically about 0.1 ~ 0.2 in the stretch 
parameter xi, corresponding to a 0.01 ~ 0.02 correction in 
peak magnitude). For example, the light curve fit to 2002bo 
(shown in the right column of Figure 2) is compatible with 
the data at all epochs, even though epochs before -10 and af- 
ter -1-20 days were excluded. The differences between fits of 
the poorly fit set were about twice as large as for the well-fit 
set, as expected. However, even the poorly fit set shows mod- 
est variations with epoch range compared to many explosion 
models (Section 3). This test demonstrates once again that 
the results of high-quality light curve fits are less sensitive to 
the fitted epoch range than the results of poor fits. Since ob- 
served data generally fit much better than explosion models, 
their fitted parameters change relatively Uttle when changing 
the fitted epoch range. 

In conclusion, the choice of filter bands and epoch range 
systematically bias the results of light curve fits to explosion 
models which are not well fit by SALT2. The choice of filter 
bands has a particularly strong impact on color, and the epoch 
range on stretch. The peak brightness is generally less sensi- 
tive to the details of the fitting procedure. Since there is no 
clear choice, we fit models with a wide range of reasonable 
filter and epoch range choices. Here, we fit each light curve 
with three sets of filters (UBVR, BV and ugr) as well as four 
epoch ranges (-10 to +20, -10 to +40, -20 to +20 and -20 
to +40), resulting in a total of 12 fits. The three sets of filter 
bands allow us to fit the maximum wavelength range (UBVR), 
get the best fit to the well-constrained B and V bands (BV), 
and investigate an alternative filter system (ugr). The set of 
epoch ranges was chosen to include and exclude the regions 
of -20 to -10 days, and +20 to +40 days, since many ex- 
plosion models struggle to match the SALT2 model at those 
epochs. The dispersion in the fitted parameters x\, c and mb 
from the 12 fits is interpreted as a systematic imcertainty on 
those parameters. As outlined above, this dispersion is gen- 
erally larger for explosion models which are poorly fit in the 
first place, meaning that the results are better constrained for 
explosion models which agree best with data. This correlation 
is reassuring, since we care somewhat less about how poorly 
a model fits the data, but do care in cases where the model fits 
the data well. 

Finally, we wish to define a goodness-of-fit parameter for 
a light curve fit. As discussed above, x^/^dof cannot be in- 
terpreted in the usual statistical context, since, in the absence 
of uncertainties on the explosion model, the formal x^ /Ndof 
gives a small likelihood for any of the light curve fits in this 
analysis. Thus, we define a;^^^-like goodness-of-fit parameter, 

/-2 1 V' / ■'^ i.model " F i,SALT \^ , 

where F, is the flux at epoch i and ctisalt the uncertainty on 
the SALT2 model at that epoch. The main difference between 
and x^/^dof is that is agnostic of uncertainties on the 
explosion model light curves. Thus, we once again emphasize 
that no statistical interpretation should be derived from the 
value of 

2.4. SNIa Population in Color-Stretch-Magnitude Space 

In the previous step, we quantified the agreement between 
explosion model light curves and the SALT2 model, indicat- 
ing whether they represent a Type la-like explosion. However, 
there are three free parameters in the SALT2 fit, xi, c and 
otb- Even if the light curves in question are well fit with the 
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Table 1 

Parameters for the population model of SNe in - c - Mb parameter space 



Parameter 


Value 


Description 


Reference 


a 


0.13 


Slope of Mb-xi relation (derived from Hubble diagram fit) 


Conley et al. (201 1); Sullivan et al. (201 1) 


P 


3.2 


Slope of Mb-c relation (derived from Hubble diagram fit) 


Conley et al. (201 1); Sullivan et al. (201 1) 


Mo 


-19.095 


Uncorrected B-band peak magnitude (derived from Hubble diagram fit) 


Guy et al. (2010) 


Ho 


VOkms"' Mpc"' 


Hubble constant (degenerate with Mq) 


Guy et al. (2010) 


O-M 


0.13 


Intrinsic scatter in Mb (including uncertainty in Ho) 


Kessleretal. (2013) 


J"xl 


0.5 


Mean of distribution in xi 


Kessleretal. (2013) 


0"xl- 


1.4 


Standard deviation for X[ > fi^i 


Kessler et al. (2013) 


0"xl+ 


0.7 


Standard deviation for xi < fi^i 


Kessler et al. (2013) 


J"c 





Mean of distribution in c 


Kessler et al. (2013) 


O-c- 


0.08 


Standard deviation for c > fJc 


Kessler et al. (2013) 


C"c+ 


0.13 


Standard deviation for c < fi^ 


Kessler et al. (2013) 


Wc.2 


0.04 


Weight of secondary distribution in c 


This paper 




0.3 


Secondary distribution: Mean in c 


This paper 


O-c-,2 


0.2 


Secondary distribution: Standard deviation for c > pc,2 


This paper 


Cc+,2 


0.2 


Secondary distribution: Standard deviation for c < //c,2 


This paper 


WM,2 


0.04 


Weight of secondary distribution in Mb 


This paper 


A'M,2 


-19.095 


Secondary distribution: Mean in Mb 


This paper 


OTVI,2 


0.35 


Secondary distribution: Standard deviation for Mb 


This paper 



Note. — The quoted value for Mq given in GIO refers to the SiFTO light curve fitter rather than SALT2. Because SiFTO has dift'erent corrections for 
color and stretch, the uncorrected magnitude Mo is slightly different, but can be converted to the SALT2 value using linear transformations (Guy, private 
communication). The corresponding value of Hq used in their analysis is 70 km s"' Mpc"' . See Section 4.1 for a detailed discussion. 



SALT2 model, there is no guarantee that the resulting param- 
eters correspond to values observed in nature. Namely, the 
explosion model could suffer from physically unreasonable 
color and stretch properties, or not obey the magnitude-stretch 
and magnitude-color relations. Furthermore, we wish to eval- 
uate the likelihood that a certain set of fit parameters could be 
observed in nature. Hence, we need a model for the popula- 
tion of SNela in the me -xi -c parameter space. In the context 
of SALT2, xi and c are independent parameters, whereas the 
peak magnitude is correlated with xi and c. Thus, generat- 
ing a population model consists of two somewhat indepen- 
dent steps: quantifying the observed distributions of stretch 
and color, and quantifying the relations between stretch, color 
and magnitude. 

First, we need to model the distribution of stretch and color 
values in observed SNela. The observed distributions of xi 
and c have been quantified for various surveys, but surveys are 
subject to significant observational biases. Here we are inter- 
ested in the underlying distribution of xi and c, correspond- 
ing to the results of an ideal, unbiased survey. These unbiased 
distributions were modeled by Kessler et al. (2013, hereafter 
K13), who determined the distributions by comparing Monte- 
Carlo simulations with survey data. The Monte-Carlo simula- 
tions, run using snana, assumed a certain parent population in 
stretch and color, and forward-modeled various observational 
biases. These efl'ects included an improved signal-to-noise 
model and the corresponding cuts, Malmquist bias, search ef- 
ficiency, as well as the efficiency of the spectroscopic selec- 
tion. The most likely underlying population was determined 
by matching the results of the simulation with the observed 
distributions of 251 SNela from the Sloan Digital Sky Survey 
(SDSS, Appendix A.l) and 191 SNela from the Supernova 
Legacy Survey (SNLS, Astier et al. 2006, GIO). The under- 
lying populations were modeled as asymmetric, continuous 
Gaussians such that for a parameter 6 = (xi, c). 



dP(e) 

de 



1 



n (0-0+ + (Te-) 



exp 



(e-i^»f \ y n ^ 



(7) 



Table 1 shows the values for jig and the various standard de- 
viations used, as determined by K13. 

Having quantified the distributions of stretch and color, we 
now move on to their correlations with magnitude. In Sec- 
tion 2.2 we described how the SALT2 model deals with the 
relations between xi, c, otb> and Mb using the parameters Mq, 
a and jS which are derived from a Hubble diagram fit. Since 
explosion models are not guaranteed to follow the relations 
between stretch, color and magnitude observed in nature, we 
do not use a set of explosion models to derive Mq, a and p. 
Instead, we pick sensible values derived from large SNIa sur- 
veys (Table 1). We discuss these choices in detail in Section 
4.1. 

Equation (4) suggests that the fitted (and thus Mb) is 
perfectly correlated with x\ and c, and assumes no residual 
scatter in the Hubble diagram which is unrealistic. Thus, we 
add a scatter term. Am to the absolute magnitude, giving the 
final relation 



Mb = Mo - axi -i- ySc -h Am 



(8) 



where Am is assumed to be drawn from a symmetric Gaus- 
sian distribution with mean zero and standard deviation ctm- 
The standard deviation includes both the intrinsic scatter as 
well as the uncertainty of the Hubble constant, which is a sub- 
dominant but relevant contribution. We are now in a position 
to write down the probability density of SNela in the three- 
dimensional space of stretch, color and absolute magnitude. 
The probability distribution is the product of three indepen- 
dent Gaussians in xi, c and Am, and can easily be transformed 
from the magnitude scatter Am to absolute magnitude. 



dP{x\,c,M-Q) en e^"^ dx\ dcdM-Q 



(9) 



with 



2^ (Xi -jUxl)^ {C - lief A|j 
X 2 2 2 

^ (xi -jUxi)^ ^ (c - Hcf ^ (Mb -i-orxi -pc-Mpf 
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where cti-, and cr^ are understood to take on different values 
above and below the respective means of their distributions. 
We note that all cr denote the widths of the population distri- 
butions rather than fitted errors. A more formal and general 
derivation of this result is given in Appendix B. 

Figure 4a shows confidence contours for the likelihood of 
Equation (9). Each panel shows the contours in two variables, 
marginalized over the third variable. Note that in the cases 
of the x\ - Mb and c - Mg planes, the marginalization over 
asymmetric Gaussian distributions leads to a two-dimensional 
probability which is not itself Gaussian any more. Expres- 
sions for the probability density shown in Figure 4 are given 
in Appendix B. 

To check that our population model differs from biased 
survey data in ways which we expect and can explain, we 
overplot data from the SDSS survey in Figure 4a. We re- 
view the SDSS survey, the cuts applied to the sample, and 
our procedure for estimating the rest-frame peak magnitude in 
Appendix A. The differences between the overplotted SDSS 
data and the population model clearly demonstrate why sim- 
ply comparing the fit results from explosion models to those 
of survey data would be inappropriate. The SDSS data is 
strongly biased toward blue SNela (low c) and slow de- 
cliners (high xj), which are synonymous with bright events 
(Malmquist bias). All outliers are compatible with the 3cr 
confidence contours, at least within their Icr error bars. 

However, there is one potential issue with the population 
model derived from SDSS and SNLS data. From the survey 
data used to determine the xi and c distributions (see Figures 
1 and 2 in K13), one might suspect that the dim ends (red, fast 
decliners) of these distributions are poorly constrained. In or- 
der to investigate these regions of parameter space, we turn 
to a low-redshift survey, the CfA3 sample of nearby SNela 
(reviewed in Appendix A.2). The CfA3 sample was not con- 
sidered in the original population model since its survey bi- 
ases could not be modeled as accurately as those of SDSS 
and SNLS, but it contains a higher fraction of red, dim events. 
Thus, the Cf A3 survey should be ideal for the purpose of find- 
ing outUers in the SNIa population. In addition, we use the 
well-observed set of SNela used in Jha et al. (2007), and re- 
fer to the combined sample as the nearby sample. Figure 4b 
shows the nearby sample plotted over the contours of our pop- 
ulation model. The comparison demonstrates that the nearby 
sample contains some objects which lie significantly outside 
the xi - c range expected from the population model of KI3 
(with a sample of this size, there should be zero or one ob- 
jects outside the 99.7% contour). In particular, there appears 
to exist a very red population not represented by the popula- 
tion model. Secondly, the xi - Mb plot suggests that there 
are objects with both brighter and dimmer magnitudes than 
expected. The cuts described in Appendix A. 3 ensure that 
the extreme fit results are not artifacts of poor light curve fits. 
However, we cannot exclude the possibility that at least a frac- 
tion of the red, dim SNela experienced significant host galaxy 
reddening. Nevertheless, as long as there is a possibility that 
the red colors and dim magnitudes are intrinsic, the popula- 
tion model should not exclude explosion models with similar 
characteristics. 

To account for the tails in the SNIa population, we extend 
the population model to include a second Gaussian, making 
up 4% of the overall population, in both c and Mb. The 
means and standard deviations for the secondary populations 
are listed in Table 1. The second Gaussian in color is cen- 
tered at c = 0.3, with cr = 0.2, adding a total 1.3% likelihood 



for a SNIa to be observed with a value of c greater than the 
previous 3cr limit of c = 0.39. Since the second Gaussian is 
centered at red colors, it adds virtually no probability where 
c < 0. The second Gaussian in Mb is centered at the original 
value of Mq, and thus simply allows for an outlier population 
at low and high magnitudes, adding a total of 0.5% likeli- 
hood each above and below the previous 3cr limits. Contrary 
to color and magnitude. Figure 4b shows no evidence for an 
extreme population in xi . Figure 4c demonstrates that the ex- 
tended population model does a better job of accounting for 
the extreme objects of the nearby sample. Figure 5 shows 
the same extended population model, for the corrected mag- 
nitude, Mb + ax\ - /3c, instead of Mb. The advantage of this 
variable transformation is that the Gaussians are independent 
in all three variables, and it is thus easier to see which parts of 
the 3D parameter space are allowed. The plots of Mb can be 
somewhat misleading, since they show projected confidence 
contours, not the 3D likelihood. A more detailed mathemati- 
cal description of the population model is given in Appendix 
B. 

Furthermore, we note that the contours in the Xi - Mb plane 
of Figure 4c indicate a less tight correlation between stretch 
and peak magnitude than the PhilUps relation between Amis 
and Mb (see, e.g.. Figure 1 of Hillebrandt et al. (2013) or 
Figure 7 of Blondin et al. (2011)). There are a few rea- 
sons for the weaker correlation. First, tight Phillips relation 
fits are often derived from color-corrected peak magnitudes 
(e.g. FoIateUi et al. 2010), while Figure 4c shows uncorrected 
magnitudes. Removing the component of the scatter in mag- 
nitude which is correlated with color reduces the scatter in 
the stretch-magnitude relation significantly. Secondly, our ex- 
tended population model accounts for dim objects which have 
been shown to lie off the main Phillips relation (Foley et al. 
2012). Finally, the stretch parameter xi is derived from an 
entire set of light curves, rather than two particular epochs in 
B-band, and there is no reason to assume that its correlation 
with peak magnitude would be as tight as that of Am 15. 

While the extended population model seems to do well at 
describing the nearby sample, we emphasize that the purpose 
of the extension is not to make accurate statements about the 
actual, underlying population of SNela, since it was not de- 
rived from data in a rigorous fashion. Given the scarcity of 
current data, such an investigation will be a future project (see 
Section 4.1). Here, we simply make sure to include explosion 
models with extreme fitted parameters if there are any reason- 
ably well observed events with similar parameters. Adding 
secondary populations with a few percent of the overall pop- 
ulation changes the likelihood closer to the peak of the dis- 
tribution very little. In the context of comparing explosion 
models to the population model, the extended model only dif- 
fers from the basic model when the fit results are somewhat 
extreme. The validity of our population model is further dis- 
cussed in Section 4.1. 

2.5. Figure of Merit 

In the last two sections we described procedures for fitting 
explosion models with SALT2, and derived a goodness-of-fit 
indicator as well as a likelihood of the fit results corre- 
sponding to values observed in nature. We now wish to merge 
these two components into one figure of merit, /, which rep- 
resents a weighted sum of its two constituents. Of course, 
there are countless ways to define /, meaning any procedure 
is somewhat arbitrary and can be changed for any particular 
purpose. Here, we aim to assign equal weight to both the 
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Figure 4. Constructing a population model. Each row shows confidence contours (68.3%, 95.4% and 99.7%) in ,ri - c, .vi - Mb and c - Mb space, marginalized 
over the third variable which is not plotted. Row a): the population model of Kessler et al. (2013), with survey data from SDSS. The comparison demonstrates 
the necessity of a population model, since the survey data are highly biased toward blue (low c), slow-declining (high a'i) events. Row b): the nearby sample 
contains outliers which should not exist according to the population model. Row c): the extended population model does a better job accounting for the nearby 
outliers (see Section 2.4 for details). 
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Figure 5. The extended population model as in Figure 4c, but plotting the connected magnitude instead of Mb . In coirected magnitude space, the basic population 
model consists of three independent Gaussians, and it is easier to visually evaluate how likely an event is according to the model. The black crosses mark the loci 
of maximum probability density. 
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goodness-of-fit and the likelihood of fit parameters. 

For a given set of fitted parameters, we define p to be the 
fraction of the population with a lower likelihood of being 
observed in nature than the set of fit parameters in question. 
This definition means that p = I corresponds to Xi , c and Mb 
at the point of highest likelihood, and, for example, p = 0.01 
indicates than only 1 % of observed SNela have parameter val- 
ues more extreme than those at hand. It makes sense to put a 
lower limit on the figure of merit for p because the population 
model is not defined at very low likelihoods, and the compu- 
tation can return due to round-ofl" error. Furthermore, we 
re-normalize the figure of merit to take on values between 
(worst) and 1 (best) to simplify its interpretation. We thus 
define 

min 

lO V /?<A™n. 

Similarly, we need to define a above which we consider a 
light curve fit to have failed. Incidentally, some light curves fit 
so poorly that the fit fails within snana, and we penalize such 
failed fits by assigning them = While p is automati- 
cally limited io p < 1, there is no clear lower limit for f^. A 

below 1 corresponds to a set of light curves which, on aver- 
age, lie within the statistical uncertainty of the SALT2 model. 
Such an excellent match is achieved by a few individual fits to 
the explosion models considered here, but no model achieves 
^•^ < 1 when averaged over the 12 sets of fits with different 
filters and epoch ranges. Nevertheless, we wish to ensure that 

has both lower and upper bounds, and thus define 



1 

logl 



itfL.)-logiotf') 







lOgloCfLix) 



V 

V 1 < ^2 < ^. 



■2 

max 



(11) 



e2 

>max • 



The overall figure of merit is simply a weighted sum of and 
/f2 , with weights Wp and Wf . As both /p and ff are logarithms 
of p and f^, summing them is equivalent to multiplying p and 
with the weights as exponents. The weights can be set 
arbitrarily to emphasize the quality of either light curve fits or 
fit results. Here, we wish to weight them equally, meaning /p 
and should contribute roughly similar values to the overall 
figure of merit. For the models considered in this analysis, we 
find that this is roughly satisfied for the simple average of /p 
and ff, Wp = = 1, leaving 



Wp/p + ^efe _ fp + fe 

Wp + Wf 2 



(12) 



We are left with the task of choosing sensible values of prmn 
and f such that we do not unnecessarily exclude regions of 
parameter space, but cut off regions where the fits and pop- 
ulation model are not informative. For /p, we conservatively 
choose a limit of p ^m = 10"^ which corresponds to an ex- 
plosion model lying almost 4o" ofl" our population model. At 
such low levels of likelihood, our model is not constrained, 
and the explosion model has been ruled out anyway. For f^i, 
a limit of = 100 seems sensible, since from inspection 
of the light curves it is clear that any fit with a greater has 
failed to converge. 

We emphasize once again that / should not be assigned 
any statistical meaning, since has no statistical meaning; / 
merely serves as a convenient measure of Ught curve fit qual- 
ity. Since each set of light curves is fit multiple times with 



different filter bands and epoch ranges, we average the figures 
of merit of the individual fits, and denote the averaged quan- 
tities (^2), (/p) and (/). When error bars on these quantities 
are given, they refer to the root mean squared (rms) deviation 
from the mean figure of merit. Note that this procedure does 
not imply averaging over viewing angles, which for now we 
treat as individual SNIa events. Where the figures of merit 
are also averaged over viewing angle, we denote them {{f^i)), 
«/p» and «/». 

When discussing the fitted parameters for particular explo- 
sion models (or a particular viewing angle of a model), it 
would be cumbersome to refer to each of the 12 sets of light 
curve fits individually. Instead, we wish to compute the most 
likely x\, c and Mb for the explosion model. Thus, we define 
averages over all fits, (xi), (c) and (Mb). We note that these 
quantities are not used to compute figures of merit, but are 
used for plotting the fit results on the contour plots of the pop- 
ulation model. Since fits with a better goodness-of-fit param- 
eter f ^ are bound to give more accurate estimates of the fitted 
parameters, we define the averaged parameters as a weighted 
average, 

where / runs over all 12 fits except those which failed (mean- 
ing they have > ^^^x' or snana fit failed to converge in 
the first place), (c) and (Mb) are defined similarly. Further- 
more, we compute the rms deviation from the mean parame- 
ters separately for values above and below the mean, respec- 
tively. The rms deviations are not weighted by and serve 
as the error bars shown in Section 3. We emphasize that these 
error bars do not represent 68% intervals of the UkeUhood of 
obtaining a certain parameter, or any other quantity which can 
be interpreted statistically, but merely give an estimate of the 
dispersion of a fitted parameter between fits. When the error 
bars are large, the large rms deviation is usually due to one 
or a few failed fits which return unreasonable results. Since 
the mean parameters, {x\), (c) and (Mb), are weighted by 
they are much more reliable in those cases than the error bars 
indicate. Furthermore, we note that we do not compute fit 
results averaged over viewing angle (such as ((xi ))), but in- 
stead plot separate fit results for each viewing angle, giving an 
impression of the variations of fit results with viewing angle. 

3. RESULTS 

With the machinery developed in the previous section, we 
are now in a position to compute figures of merit in a fully au- 
tomated way. Table 2 lists the figures of merit for goodness- 
of-fit, fitted parameters, and the overall figure of merit for all 
explosion models introduced in Section 2.1. Figure 6 shows 
the distribution of {{ff)) and {{f^)) graphically. The overall 
figures of merit span a range of 0.185 < «/)) < 0.786, most 
of the allowed range of < {{/)) < 1. The figure of merit 
for fitted parameters, {(/p)), spans almost the entire possible 
range, < ((/p)) < 0.98, meaning that the explosion mod- 
els lie both at the very center of the population and entirely 
outside it. The figure of merit for light curve fit, {{ff)), ex- 
hibits a slightly smaller range with 0.24 < {{ff)) < 0.76, but 
it is important to keep in mind that {{ff)) is averaged over 
different fits. As these fits include different filter bands and 
epoch ranges, we expect that fits including UBVR and the 
largest epoch range will always receive a lower f^i than those 
with fewer filter bands and epochs. Thus, we expect the av- 
eraged iff) not to reach the extreme values of 1 or 0. In the 
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Table 2 

Figures of merit for all analyzed explosion models 
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0. 
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0.051 


0.591 
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0. 
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0.65 
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0.761 


0. 
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0.58 
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0.702 


0.064 


0.557 


0.777 


0. 


217 


0.68 


KRW09jso_8_dc_3 




0.696 


0.028 


0.663 


0.766 


0. 


168 


0.52 


KRW09Jso_2_dc_5 




0.683 


0.011 


0.666 


0.707 


0. 


134 


0.50 


KRW09jso_4_dc_4 




0.681 


0.068 


0.499 


0.766 


0. 


152 


0.48 


KRW09jso_2_dc_2 




0.672 


0.042 


0.627 


0.764 


0. 


169 


0.65 


KRW09jso_l_dc_2 




0.671 


0.026 


0.542 


0.688 
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0.570 
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0. 
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0. 
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0.544 
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0.542 
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0.501 


0.607 


0. 


,386 


0.53 
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0.062 
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0. 
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0.54 
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0. 


,059 
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0. 


,144 


0.32 
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0.388 








0. 


,093 


0.24 


PreExpSO 




0.301 


0.076 


0.203 


0.420 


0. 


,237 


0.46 


PureDeL3500_384 




0.212 








0. 


213 


0.40 


PureDef_1700_384 




0.207 








0. 


185 


0.37 


PureDef_1100_256 




0.185 








0. 


161 


0.37 



O-fr 



/p 



o-ft 



0.04 

0.07 
0.11 
0.11 
0.03 
0.02 
0.04 
0.04 

0.07 
0.04 
0.04 
0.05 
0.09 
0.04 
0.06 
0.12 
0.08 
0.04 
0.03 
0.04 
0.04 
0.06 
0.06 
0.09 
0.07 
0.09 
0.12 
0.10 
0.07 
0.05 
0.05 
0.09 
0.07 
0.06 
0.05 
0.05 
0.07 
0.05 
0.06 
0.02 
0.03 
0.02 
0.02 
0.04 
0.07 



0.60 
0.54 
0.58 
0.56 
0.43 
0.49 
0.53 
0.44 

0.54 
0.44 
0.46 
0.39 
0.56 
0.54 
0.49 
0.57 
0.59 
0.55 
0.34 
0.43 
0.40 
0.57 
0.49 
0.56 
0.42 
0.49 
0.56 
0.55 
0.52 
0.56 
0.54 
0.43 
0.54 
0.45 
0.52 
0.46 
0.56 
0.44 
0.36 
0.41 
0.49 
0.38 
0.43 
0.38 
0.43 



0.77 
0.78 
0.83 
0.82 
0.53 
0.55 
0.69 
0.63 

0.78 
0.60 
0.63 
0.57 
0.80 
0.78 
0.64 
0.83 
0.81 
0.80 
0.47 
0.58 
0.56 
0.79 
0.72 
0.81 
0.66 
0.75 
0.82 
0.80 
0.78 
0.85 
0.83 
0.76 
0.79 
0.68 
0.73 
0.60 
0.80 
0.62 
0.57 
0.50 
0.59 
0.46 
0.52 
0.56 
0.65 



0.04 0.40 0.52 



0.29 
0.26 
0.32 
0.30 
0.20 
0.15 
0.20 
0.16 
0.10 
0.30 
0.21 
0.30 
0.21 
0.32 
0.20 
0.23 
0.30 
0.24 
0.40 
0.26 
0.17 
0.16 
0.18 
0.38 
0.19 
0.24 
0.27 
0.26 
0.23 
0.25 
0.41 
0.42 
0.26 
0.38 
0.37 
0.24 
0.27 
0.38 
0.31 
0.20 
0.21 
0.33 
0.22 
0.29 
0.38 
0.25 
0.14 
0.18 
0.18 
0.36 
0.40 
0.32 
0.32 



0.88 
0.90 
0.85 
0.82 
0.98 
0.95 
0.79 
0.84 
0.78 
0.73 
0.87 
0.86 
0.88 
0.69 
0.58 
0.77 
0.65 
0.60 
0.73 
0.88 
0.77 
0.82 
0.55 
0.71 
0.53 
0.68 
0.63 
0.57 
0.56 
0.55 
0.65 
0.61 
0.54 
0.60 
0.61 
0.51 
0.65 
0.59 
0.68 
0.69 
0.66 
0.56 
0.66 
0.61 
0.56 
0.48 
0.65 
0.61 
0.53 
0.14 
0.02 
0.04 
0.00 



0.01 
0.04 
0.02 
0.02 
0.01 
0.04 
0.07 
0.08 

0.07 
0.07 
0.03 
0.12 
0.03 
0.01 
0.10 
0.03 
0.03 
0.01 
0.13 
0.08 
0.06 
0.02 
0.04 
0.04 
0.05 
0.04 
0.04 
0.02 
0.05 
0.02 
0.03 
0.05 
0.03 
0.04 
0.02 
0.03 
0.03 
0.05 
0.07 
0.04 
0.02 
0.04 
0.03 
0.05 
0.06 



0.84 

0.79 
0.83 
0.75 
0.93 
0.80 
0.65 
0.64 

0.56 
0.78 
0.76 
0.54 
0.66 
0.54 
0.65 
0.59 
0.52 
0.71 
0.55 
0.64 
0.74 
0.48 
0.66 
0.46 
0.60 
0.49 
0.50 
0.51 
0.45 
0.61 
0.48 
0.47 
0.53 
0.54 
0.47 
0.57 
0.52 
0.65 
0.57 
0.61 
0.52 
0.63 
0.54 
0.49 
0.35 



0.89 

0.95 
0.88 
0.83 
0.99 
0.99 
0.87 
0.95 

0.80 
0.98 
0.90 
0.96 
0.75 
0.60 
0.96 
0.69 
0.64 
0.75 
0.98 
0.89 
0.95 
0.57 
0.79 
0.59 
0.77 
0.67 
0.62 
0.59 
0.60 
0.68 
0.65 
0.60 
0.63 
0.74 
0.53 
0.72 
0.62 
0.81 
0.81 
0.81 
0.63 
0.81 
0.70 
0.66 
0.57 



0.12 0.00 0.37 



0.06 
0.04 
0.06 
0.06 
0.01 
0.04 
0.19 
0.21 
0.09 
0.17 
0.14 
0.08 
0.11 
0.09 
0.15 
0.30 
0.07 
0.15 
0.08 
0.13 
0.30 
0.21 
0.10 
0.13 
0.10 
0.32 
0.14 
0.08 
0.09 
0.21 
0.08 
0.12 
0.25 
0.09 
0.41 
0.28 
0.41 
0.08 
0.42 
0.28 
0.39 
0.37 
0.43 
0.41 
0.40 
0.19 
0.16 
0.16 
0.18 
0.13 
0.08 
0.11 
0.00 



Note. — The models in this table are sorted by a descending overall figure of merit, ((/>>. Some of the column headers take on slightly 
different meanings compared to the text, as explained in the comments below. The original KRW09 paper lists 44 models, but accidentally no 
radiative transfer results were computed for iso_l_dc_4 (Blondin et al. 201 1). See Section 3 for a discussion of these results. 

* The bold columns /, f^i and fp refer to the viewing angle averaged figures of merit, ((/>>, ((/^2)) and {{fp}}, where multiple viewing angles 
were present. Otherwise, they refer to the figures of merit averaged over fits, (/>, (f^i) and (fp). 

' The three columns to the right of each figure of merit show the standard deviation in viewing angle, cr$, as well as the minimum and maximum 
figure of merit in any one viewing angle, denoted 6,nin and ^max- For models with only one viewing angle, these columns are blank. 

* The fourth column to the right of each figure of merit shows the standard deviation over the 12 fits performed, o"f, . For models with multiple 
viewing angles, erg. is the average of the standard deviations in each viewing angle. Generally, the variance of the figures of merit over the 12 
fits, CTj., is large which is expected, particularly for models which exhibit poor light curve fits (see Section 2.3). 
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Figure 6. The figures of merit from Table 2, witii tlie quality of the light 
curve fit, iif^i)), increasing to the right, and the likelihood to observe a SNIa 
with the fitted parameters in nature, ((/p>>, increasing toward the top. The 
diagonal gray lines and labels show lines of constant overall figure of merit. 
The error bars show the dispersion of the figures of merit between the viewing 
angles of a given model. 

following sections we discuss the results for each family of 
explosion models. 

3.1. The KRW09 Delayed-Detonation Models 

Generally, all KRW09 models do relatively well (0.544 < 
((/)) < 0.786), particularly in matching the population 
model (0.48 < {{ff}) < 0.98). The isotropic models (dai'k 
blue points in Figure 6) tend to rank somewhat higher than 
the asymmetric models (light blue points), with the top 15 
models isotropic (note that second highest ranked model, 
asym_l_dc_2, is really an isotropic model). With 43 models 
in the KRW09 set, a total of 1290 viewing angles and 15,480 
fits, it is difficult to give a visual impression of the various 
light curves and fits. Figure 7 shows three selected light curve 
fits for some of the highest and lowest ranked viewing angles. 
It is noteworthy that the light curves of the highest ranked 
isotropic and asymmetric models look very similar. 

In order to give an overview of the region of SALT2 param- 
eter space occupied by the KRW09 models, the top row of 
Figure 8 shows the location of all KRW09 models on the pa- 
rameter space plots. For clarity, the error bars are omitted. On 
average, the asymmetric models tend to be slightly brighter 
and redder than the isotropic models. The isotropic models 
lie somewhat closer to the center of the population distribu- 
tion in all three panels. This trend is reflected in the higher 
((/p)) that the isotropic models receive on average (Table 2). 

The bottom row of Figure 8 shows the locations of a few 
selected models, including those with the highest and lowest 
((/)), highest and lowest ((/p)) and lowest (/p) of any viewing 
angle. While the error bars on color are quite large, we note 
that the error bars represent the standard deviation given the 
set of 12 fits performed, and are usually dominated by one 
or a few very bad fits. Since the average values plotted are 
weighted averages, the fitted parameters from the bad fit runs 
are essentially ignored, and the average value is presumably 
more reliable than the error bars indicate (see discussion in 
Section 2.5). 

One might suspect that the asymmetric models would ex- 



hibit systematically different light curves with viewing an- 
gle, and thus a larger dependence of their figure of merit with 
viewing angle. Such a trend, however, is not apparent in the 
dispersions of (/) with viewing angle (erg in Table 2), or in the 
parameter space plots in Figure 8. In fact, the model with the 
highest dispersion is iso_8_dc_2 {erg = 0.074), though a low 
dispersion in the figure of merit does not automatically mean 
that the fit results show a low dispersion as well. 

The KRW09 models have been compared with observations 
before, namely in the original KRW09 paper and in Blondin 
et al. (201 1). Figure 3 of KRW09 demonstrates that the mod- 
els broadly follow the Phillips relation, with only a few mod- 
els outside the Icr contour. Furthermore, the models adhere to 
the observed color-magnitude relation, with some scatter We 
can confirm both results, though some models lie outside our 
2cr contours because their peak magnitudes are at the bright 
end of the observed distribution. 

Blondin et al. (201 1) excluded six of the models on the basis 
of spectral comparisons, namely the dc_2 and dc_3 versions of 
asym_3, asym_6 and asym_8. Interestingly, the asym_3 mod- 
els are the highest ranked asymmetric models according to our 
method (0.662 < «/)) < 0.664), and the asym_6 and asym_8 
models range in mid-field (0.632 < ((/)) < 0.647). However, 
out of a subset of eight models which were analyzed in detail 
in Blondin et al. (2011), iso_3_dc_l showed the best spectral 
match with observations. This model is one of the highest 
ranked models in our analysis as well (((/)) = 0.747). On 
the other hand, the second and third best spectral matches, 
asym_l_dc_3 and iso_6_dc_5, range toward the bottom of the 
table in our analysis (0.544 < «/» < 0.600). Such discrep- 
ancies demonstrate that spectral and light curve comparisons 
can arrive at quite different conclusions. One particular case 
of disagreement, iso_6_dc_2, is discussed in more detail in 
Section 4.2. 

3.2. The W7 Model 

As expected from the discussion in Section 2.1, Table 2 
shows that W7 agrees with observed light curves very well. 
The average figures of merit for the light curve fits are {f^} - 
0.63 ±0.1, </p) = 0.78 + 0.09 and </) = 0.706 + 0.04, as 
shown by the red triangle in Figure 6. Notably, W7 has the 
smallest variance of / between fits of any model, indicating 
that it fits well regardless of what filter bands and epochs are 
fit. The right column of Figure 7 shows UBVR light curve 
fits to the W7 model, fit over the entire epoch range computed 
by PHOENIX. The goodness-of-fit parameter suffers mainly 
from two features of the light curves, namely the high U-band 
peak brightness and the strong second peak in R-band. 

The fitted parameters for W7 match observations as well 
as the light curves, as shown with the black data points in 
Figure 10. W7 also lies well within the range of the popula- 
tion model, though slightly on the bright side for its stretch 
and color Since W7 is not a very physical model (see Section 
2.1), there are no deep insights to be taken away from these re- 
sults. They simply demonstrate that our method arrives at the 
accepted conclusion that W7 represents observed light curves 
surprisingly well. 

3.3. Pure Deflagration Models 

Though underluminous SNela have recently received in- 
creased attention (Foley et al. 2012), they constitute at best a 
sub-class of SNela. They are not well represented in observed 
samples because they are both less frequent and dim, and thus 
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Figure 7. UBVR light curve fits to tlu'ee KRW09 models (left three columns) and W7 (right column). The lines have the same meaning as in Figure 2. Left 
column: the 125°viewing angle of the iso_8_dc_l model, the viewing angle with the highest overall figure of merit ((/) = 0.848). Second column: The 92°viewing 
angle of the asym_3_dc_2 model, the highest ranked asymmetric model (((/>> = 0.664). The light curves look rather similar to the isotropic model in the left 
column. Third column: The 88° viewing angle of the KRW09 Jso_6_dcJ2 model, one of the lower-ranked models (((/>) = 0.557). The fit shown here excludes 
early and late epochs, since the fit with all epochs does not converge. Right column: The W7 model is well fit with SALT2 across the entire epoch range, 
particularly in the B and V bands («/» = 0.706). 
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Figure 8. Fit results for the KRW09 models. Top row: Fit results for all viewing angles of all KRW09 models, plotted without error bars for clarity. The KRW09 
models occupy a favorable region of parameter space, with a few models outside the 2o" contours in the stretch-magnitude and color-magnitude planes. On 
average, the isotropic models (black) are favored over the asymmetric models (red). Bottom row: Fit results for several individual models including iso_5_dc_l 
(red, highest ((f))), iso_6_dc_3 (green, lowest ((f))), iso_3_dc_3 (pink, highest ((/p))), iso_l_dc_3 (cyan, lowest ((fp))) and asym_5_dc_2 (orange, lowest (fp) of 
any viewing angle). Even the farthest outlying viewing angle of asym_5_dc_2 is not excluded by the population model. 
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Figure 9. UBVR light curve fits to the PureDef_0128_256 (left column) and 
PureDef_1700_384 (right column) models. The lines have the same meaning 
as in Figure 2. PureDef_()128-256 is the PureDef model with the best overall 
figure of merit, while PureDef_1700_384 has one of the worst. See Section 
3.3 for a discussion of the large eiTor bars in the PureDef_1700_384 fit. 



hard to detect. Pure deflagration models are generally known 
to result in underluminous SNela and may afford an explana- 
tion of these events. However, we expect pure deflagration 
models to receive relatively low figures of merit for two rea- 
sons: first, the SALT2 model was trained on normal SNela 
and typically does not fit peculiar la well; and secondly, only 
a small fraction of the SNela observed in nature are underlu- 
minous, and the population model does not favor them as a 
result. We return to this issue in Section 5. 

Figure 6 demonstrates that the pure deflagration models, 
shown with orange squares, separate into two groups: those 
with few ignition points (63, 128 and 150, 0.388 < (/) < 
0.487) and those with many (1100, 1700 and 3500, 0.185 < 
(/) < 0.212). Both groups occupy untypical locations in 
(./f-)"(./p) space, as they combine relatively poor light curve 
fits with either favorable or excluded fitted parameters. 

The left column of Figure 9 shows light curve fits to the 
few-bubble model PureDef_0128_256. The fit is relatively 
poor, and thus receives a low figure of merit (/fi), as do the 
other few-bubble models (0.24 < </^2) < 0.33). The fitted pa- 
rameters shown in Figure 10, however, reveal a different pic- 
ture. The few-bubble models (red, cyan and blue data points) 
can be accommodated by the population model, with figures 
of merit of 0.53 < </p) < 0.65. We note that the few-bubble 
models exhibit very small error bars on stretch and magnitude, 
despite their mediocre light curve fits. As discussed in Sec- 
tion 2.3, poor fits are generally more sensitive to changes in 
filter bands and epoch range, and we might thus expect large 
uncertainties on the PureDef models. However, the small er- 
ror bars are somewhat deceiving because some of the mod- 
els have only a few data points before -10 days from peak, 
meaning the fits starting at -10 and -20 days are virtually 
identical. Despite this caveat, adding a set of fits starting at 
-5 days does not significantly increase the uncertainties on 
stretch and magnitude. Thus, the few-bubble models repre- 



sent a rare case where the fit results in stretch and magnitude 
are relatively certain despite a mediocre light curve fit. 

In contrast, the many-bubble models (purple, yellow and 
green data points in Figure 10) are clearly excluded by the 
population model (0 < (/p) < 0.04). Their light curve 
fits, however, receive slightly higher figures of merit (0.37 < 
(/f2) < 0.40), seemingly in conflict with the light curve 
fit to the many-bubble model PureDef-1700-384 (right col- 
umn of Figure 9). Visually, this fit looks worse than that 
of PureDef_0128_256 (left column). The goodness-of-fit for 
the PureDef_1700_384 model, however, is not as poor as 
one would expect from a visual examination of the fit, with 
{f^i) - 0.37. The particular fit shown in Figure 9 only runs 
from -10 to -1-20 days, and the epochs outside this range are 
clearly incompatible with SALT2. Nevertheless, this partic- 
ular fit receives = 2.82 which suggests a reasonable fit. 
The reason is that the error bars on the SALT2 model light 
curves are much larger than for the other fits. The SALT2 
model assigns these increased uncertainties whenever the fit 
is driven to an extreme region of parameter space, in this case 
large stretch, red color and dim magnitude. Since very few 
such events are observed in nature, the SALT2 flux surfaces 
are poorly constrained which is reflected in the larger uncer- 
tainties. At first sight, the larger uncertainties seem to pose a 
problem since they cause {f^i) to reward fits with extreme pa- 
rameters. However, extreme fitted parameters automatically 
mean a very small likelihood of observing a comparable SNIa 
in nature, and thus a very low (/p). Indeed, (/p) - 0.04 for the 
PureDef_1700_384 model, meaning that the model is deemed 
incompatible with observations despite its "unfairly" reason- 
able (/;2). Since we are not particularly interested in the exact 
numerical values of (/) for poorly fitting models, the imbal- 
ance in {f^i) does not lead to false conclusions. 

Overall, our analysis has revealed an interesting distinction 
between models with few and many bubbles. At least for this 
particular set of pure deflagration models, realizations with 
too many ignition bubbles are ruled out by observations, while 
those with few bubbles might be of further interest. We cau- 
tion, however, that these results are preliminary, and may be 
revised once the final set of models is published. 

3.4. P re-Expanded Models 

Since the pre-expansion of the white dwarf was varied over 
a large range (30% to 80% of the binding energy), we expect 
the pre-expanded models to give diverse results. The fitted 
parameters shown in Figure 1 1 confirm this expectation. The 
PreExp30 and PreExp60 models can be accommodated by the 
population model, though some viewing angles lie beyond the 
2cr contours of the stretch-magnitude and color-magnitude 
planes (0.48 < ((/p)) < 0.56). Some viewing angles of the 
third model, PreExp80, lie outside the 3cr contours, causing a 
poor figure of merit for the model overall (((/p)) = 0.14). The 
amount of pre-expansion appears to mostly change the peak 
magnitude (since it determines how much material is burned 
to ^^Ni), but not the decline rate. Thus, the models end up on 
a trajectory perpendicular to the Phillips relation. 

Since the detonation was initiated significantly off-center, 
we expect the pre-expanded models to exhibit strong varia- 
tions with viewing angle. Indeed, the PreExp80 model shows 
the strongest variations in / between viewing angles of any 
model (cTg - 0.076). Overall, the analysis shows that the para- 
metric, pre-expanded, pure detonation models have trouble re- 
producing observed light curves. However, the results allow 
us to constrain which combinations of pre-expansion, contrac- 
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Figure 10. Fit results for the W7 and pure deflagration models. The W7 model (black data point) is known to reproduce observed light curves well which is 
confirmed here, though W7 lies toward the bright end of the stretch-magnitude and color-magnitude relations. The pure deflagration models split into two groups, 
those with a few ignition bubbles (PureDef_0128_256, PureDef_0150_128 and PureDef_()063_128; red, cyan and blue data points) which are not in the center of 
the distribution, but certainly not ruled out either, and three models with many ignition bubbles. The latter are clearly ruled out by the population model as the 
values of both their stretch and color are higher than observed SNela. The PureDef_l 100-256 model exhibits a stretch value of X[ = 4.8 which lies outside the 
plotted range. 
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Figure 11. Fit results for the three pre-expanded models. Almost all viewing 
angles lie within the allowed region of color-stretch space (not shown). The 
PreExp80 model exhibits the largest scatter of (/) with viewing angle of any 
model. The amount of pre-expansion (30%, 60% and 80% of the binding 
energy, respectively) appears to mostly change the peak magnitude of the 
light curves, but not the decline rate. As a consequence, the three models lie 
on a trajectory perpendicular to the Phillips relation. 

tion and detonation criteria result in a reasonable amount of 
''^Ni, and thus reasonable magnitudes. 

4. DISCUSSION 

We have introduced a new method for evaluating simulated 
SNla light curves with observations, and applied it to a range 
of explosion models. In this section we address some remain- 
ing questions. In Section 4. 1 we discuss the parameter choices 
we made in designing the population model, and the reliabil- 
ity of those choices. In Section 4.2 we investigate an apparent 
contradiction between our evaluation method and visual light 
curve comparisons for one particular explosion model, and 
discuss potential issues with visual comparisons in general. 

4. 1 . Uncertainties in the Population Model 

The population model parameters listed in Table 1 were de- 
termined by comparing data and Monte-Carlo simulations in 
K13, and from examining the extreme tails of the nearby sam- 
ple. The global SALT2 parameters a, /3, Mq and ctm, how- 
ever, are derived from a Hubble diagram fit, with fits to dif- 
ferent data sets resulting in somewhat different values of the 
parameters (Table 3). Such discrepancies arise either because 
surveys are biased, or because the true, underlying parame- 
ters are redshift-dependent, and different surveys probe dif- 



ferent redshift ranges. Table 3 only lists analyses which used 
the GIO version of the SALT2 model or the SiFTO model, 
where SiFTO parameters were re-scaled to the corresponding 
SALT2 values (GIO). 

For the population model, we rely on the results of the most 
recent SNLS3 analyses, a = 0.13, j8 = 3.2 and Mq = -19.095 
(GIO; Sullivan et al. 201 1; Conley et al. 201 1). These SNLS3 
results, however, are in tension with the most recent analysis 
of SDSS data (Campbell et al. 2013) who find an unusually 
high value of a = 0.22 ± 0.02 in their photometric sample, 
4.5cr off the SNLS3 value. However, they suspect that the 
discrepancy may be due to non-la contamination, and find a 
lower value of a = 0.16 + 0.02 when only fitting spectroscop- 
ically confirmed events. 

In order to quantify the impact of the uncertainties on a 
and p, we plot confidence contours of the population model 
for values which differ from the fiducial values by 2cr, using 
the uncertainties from GIO. The top two panels of Figure 12 
show stretch-magnitude and color-magnitude contours for our 
fiducial population model {a - 0.13 and [5 - 3.2), as well as 
contours for a - 0.106 and a - 0.154. The contours are 
barely distinguishable from the fiducial model. In the bottom 
panels of Figure 12 we show the same plots, with [3 - 2.94 
and j6 = 3.46. The differences to the contours of the fidu- 
cial model are marginally larger than when varying a, but still 
small, particularly for the 68% and 95% contours. In con- 
clusion, we find that our values for a and fi reflect the most 
recent observational data, and that our population model is 
barely sensitive to their uncertainties. 

While the choice of a and yS does not have a significant 
impact on our population model, the choice of Mq clearly 
does. Its impact would manifest itself in Figure 12 by shift- 
ing the contours up and down in magnitude, while the explo- 
sion model M() would stay the same. As the scatter in SNla 
magnitudes is only cry^ - 0.13, fractions of a magnitude mat- 
ter We note that, in the context of the SALT2 model, Mq 
is the uncorrected {xi - c - 0) absolute B-band magnitude 
of SNela, not their average magnitude. While the magnitude 
distribution of SNela has been quantified (see, e.g., Yasuda & 
Fukugita 2010), Mo must be derived from a Hubble diagram 
fit to SALT2 fitted SNela. Unfortunately, Mq is often ignored 
in cosmological analyses since it is degenerate with Hq and 



Light Curve Comparisons 



17 



-20. r, 




-4 -2 2 4 X4 -0.2 0.0 0.2 0.4 O.G 



Stretch (xi) Color (c) 

Figure 12. Sensitivity of the population model to changes in a and /?. Each 
row shows stretch-magnitude and color-magnitude contours for the fiducial 
population model (ff = 0. 13,/J = 3.2), as well as contours with a ± Icr^ (top 
panels) and /? ± 2(T^ (bottom panels), using the uncertainties from GIO. The 
population model is remarkably stable against those changes in a and /?. 

can be scaled out without influencing cosmological results. 

As with a and /3, we choose the value of Mq found in 
the most recent SNLS 3-year analysis of GIO. It is crucial 
to note that this choice of Mq corresponds to their assumed 
Hubble constant. Ho = 70kms 'Mpc ' (GIO). This value 
is consistent with the most recent cosmological data sets, 
e.g. the WMAP9 analysis of Hinshaw et al. (2012) who find 
Ho - 69.3 + 0.8. At this low level of uncertainty, the intrin- 
sic scatter in Mb dominates over uncertainties in Hq. We do, 
however, caution that significant shifts in the best estimate of 
Ho could change Mo noticeably, namely by 0.03 magnitudes 
for each 1 kms"' Mpc"' change in Ho- This implicit uncer- 
tainty in Mo should be seen as part of our standard deviation 
in magnitude, o-f^. 

Table 3 shows that different analyses arrive at different 
conclusions regarding the intrinsic scatter in magnitude, cr^, 
which is an important input parameter to our population 
model since it directly determines the model's tolerance 
for magnitudes offset from the mean magnitude-stretch and 
magnitude-color relations. Our value for the intrinsic scat- 
ter in magnitude, ctm - 0.13, represents the coherent scatter 
model of K13. They found that this model (a wavelength- 
independent scatter of 0.13 magnitudes) reproduces the scat- 
ter observed in the Hubble diagram fit to SDSS and SNLS 
data. 

Besides the parameter choices, we note various other sys- 
tematic effects which might affect the accuracy of our popu- 
lation model. For example, Conley et al. (201 1) found a mild 
redshift dependence in the best-fit values for a and Sim- 
ilarly, it is unclear how exactly the stretch and color popula- 
tions of SNela depend on host galaxy, and thus redshift (Sul- 
livan et al. 2006; Lampeitl et al. 2010; Sullivan et al. 2010; 
Kelly et al. 2010; Smith et al. 2012). The observed evolution 
of xi and c with redshift can mainly be ascribed to Malmquist 
bias, but even after accounting for this type of bias the pop- 
ulation model parameters from SDSS and SNLS are some- 



what inconsistent, with SNLS producing more bright, blue ex- 
plosions (K13). This disagreement is qualitatively explained 
by the observation that young, star-forming galaxies tend to 
produce brighter and bluer SNe (Sullivan et al. 2006). Nev- 
ertheless, the best-fit parent populations in color and stretch 
are slightly different for SDSS and SNLS. Furthermore, the 
amount of data used for the analysis of K13 only allowed for 
a rough fit of the simulated distributions in xi and c to those 
observed in SDSS and SNLS. With larger future data sets, a 
population model should be constructed from fitting functions 
with large numbers of degrees of freedom, including a depen- 
dence on redshift and host galaxy properties. Such an inves- 
tigation is beyond the scope of this paper, but is a promising 
avenue for future research, particularly once DES data will be 
available. 

Finally, our population model makes an implicit assump- 
tion about the nature of color variations in SNela. When us- 
ing the population model to evaluate simulated SNela light 
curves, we assume that the color-magnitude relation is in- 
trinsic, and should thus be obeyed by explosion models. If, 
however, this relation was caused by processes other than the 
SNla explosion itself, for example dust extinction, explosion 
models should show discrepancies, such as never producing 
highly reddened events. The assumption of an intrinsic color- 
magnitude relation can only render our model too lenient to- 
ward explosion models. Consider the extreme case of a color- 
magnitude relation which is entirely caused by dust redden- 
ing; in that case, explosion models should occupy one par- 
ticular locus in color-magnitude space which would still be 
allowed by the population model (the locus of unreddened 
SNela). However, the model would also allow a range of 
other values of color and magnitude which correspond to the 
relation imposed by dust reddening. This extreme scenario 
has been ruled out, for example by Maeda et al. (2011) who 
investigated a sample of SNela which were deemed to be es- 
sentially free of host reddening, and found strong evidence 
for an intrinsic color-magnitude relation. Nevertheless, one 
should keep this caveat in mind when constraining explosion 
models using the color-magnitude relation. 

4.2. The Perils of Visual Light Curve Comparisons 

We have previously alluded to issues with visual compar- 
isons between simulated and observed light curves. In some 
cases, only a rough comparison with light curves is desired, 
for example if an explosion model is not intended to rep- 
resent normal SNela but a sub-group such as sub-luminous 
events. In such cases, visual comparisons are perfectly ap- 
propriate (e.g. Kromer et al. 2012). Often, however, the pur- 
pose of visual comparisons is to determine whether an ex- 
plosion model reproduces the light curves of normal SNela 
(see, e.g., Woosley et al. 2007; Kromer et al. 2010; Jack et al. 
2011; Pakmor et al. 2012). Here we investigate a particular 
example of this technique. Both KRW09 and Blondin et al. 
(201 1) overplot light curves of the KRW09_iso_6_dc_2 model 
(88° viewing angle) and the observed SNla 2003du, and note 
that there is good visual agreement. Contrary to this observa- 
tion, the KRW09-iso_6-dc_2 model receives one of the worst 
overall figures of merit among the KRW09 models, though 
it is still by no means ruled out. The light curves of the 88° 
viewing angle perform slightly better than the average, with 
</) = 0.57, </f2) = 0.56 and </p) = 0.58. In this section, we 
investigate the reasons for the discrepancy between our eval- 
uation method and the visual comparison of light curves. We 
emphasize that we pick the example of KRW09Jso-6-dc^ 
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Table 3 

Values for the global SALT2 parameters from the literature 



Reference 



Data set 



Model 



Mo 



Ho 



Sullivan et al. (2011) ' 
Conley etal. (2011) + 
Guy et al. (2010)'+ 
Marriner et al. (2011) 
Campbell et al. (2013) 



SNLS 

SNLS, SDSS, low-z, HST 
SNLS 

SDSS (4 seasons) 
SDSS (all seasons) 



SiFTO 
SiFTO 
SALT2/ SiFTO 
SALT2 
SALT2 



0.132 
0.132 
0.124 
0.135 
0.22 



0.138 
0.138 

0.012 

+0.033 
-0.017 
t0.02 



3.17-3.25 
3.17-3.25 
3.17 ±0.13 

-, 1Q+0.14 
-^■^'-0.24 

3.12 ±0.12 



-19.095 ±0.0321 



70 



0.068-0.113 
0.087 



Note. — The model column refers to the fit model for which values are given in the reference, even though other fit models may also have been 
used for the analysis. Wherever a range of parameters is given, different fits were performed, and the differences between their results were greater 
than the statistical eiTor quoted. 

t The parameters from the recent SNLS3 analyses were re-scaled from SiFTO. Since SiFTO uses slightly different color and stretch connections, a,/? 
and the uncorrected magnitude Mq also take on slightly dilferent meanings. Linear re-scaling formulae for a and fi are given in GIO. In their paper, 
the SiFTO Mq is given as A/q = 19.218 ± 0.032 and was re-scaled to the value above (Guy 2012, private communication). 
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Figure 13. Visual light curve comparison. The dots and crosses show the 
light curves of 2003du (Stanishev et al. 2007) for the two different distance 
moduli suggested in their paper, 32.79 (dots) and 32.42 (crosses). The solid 
lines show light curves of the 88° viewing angle of the KRW09-iso_6-dc_2 
model. The dashed lines show light curves of the W7 model. 

because the explosion model light curves have already been 
analyzed in this paper, not because it represents a particularly 
good or bad choice of data to compare to. 

First, and most importantly, the distance modulus to 
2003du, and thus its magnitude, are highly uncertain. Stan- 
ishev et al. (2007) quote two different distance moduli of 
yU - 32.42 and ^ - 32.79, leading to a 0.37 magnitude 
difference. The reason for this uncertainty is the uncertain 
peculiar velocity of the host galaxy of 2003du which could 
be a few hundred kms corresponding to a redshift un- 
certainty cr. a; 0.001. Given that the redshift of 2003du is 
z — 0.007, this uncertainty leads to a 0.3 magnitude uncer- 
tainty. Clearly, using very nearby SNe for absolute magni- 
tude comparisons is dangerous. Using jd - 32.79, the value 
adopted in Kasen et al. (2009), 2003du has Mb = -19.30 
(Stanishev et al. 2007), a value which coincides well with the 
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Figure 14. Fit results for the KRW09Jso_6_dc J delayed-detonation model 
(green square) and the observed SN 2003du (CfA3 data, black circle). If 
we assume the absolute magnitude resulting from the data of (Stanishev 
et al. 2007) and their suggested distance modulus of p = 32.79, the point 
shifts to Mb = -19.3 (red triangle). While the fitted parameters of both 
KRW09_iso_6_dc_2 and 2003du are in good agreement with observations, 
they occupy different loci on the parameter planes, depending on which dis- 
tance modulus is assumed. Furthermore, the assumed distance modulus de- 
termines whether 2003du is a "normal" SN in the sense that it lies close to 
the center of the color-magnitude distribution. The large error bars on the 
KRW09Jso_6_dc_2 model are due to some barely converged fits and exag- 
gerate the real uncertainty in the lit results. See Section 4.2 for a detailed 
discussion. 

KRW09_iso_6_dc_2 model. Figure 13 shows a visual compar- 
ison of the UBVRl light curves of 2003du and the 88° view- 
ing angle of the KRW09 Jso_6_dc_2 model. For comparison, 
the 2003du light curves are also plotted using the alternative 
distance modulus of jU = 32.42 which leads to a catastrophic 
fit. 

Secondly, the light curve fits to the 88° viewing angle of 
KJlW09_iso_6_dc_2 are not as good as the visual comparison 
suggests. While the particular fit shown in Figure 7 looks rea- 
sonable, the excluded early epochs exhibit a shallow rise time 
slope which cannot be accommodated by the SALT2 model, 
and causes the fit to fail when early epochs are included. Re- 
turning to the visual comparison shown in Figure 13, we no- 
tice that the rising slope of KRW09 Jso_6_dc^ is indeed quite 
shallow compared to the W7 model (dashed lines). The shal- 
low rising slope of some of the KRW09 models was pointed 
out in Figure 8 of Blondin et al. (201 1), but the comparison to 
2003du does not highlight this issue, since there are few data 
points at early times. Our fitting method captures the differ- 
ent rising slopes by assigning the KRW09 jso_6_dc_2 model 
a significantly larger stretch than 2003du (Figure 14). 

Finally, we need to address the question whether 2003du is 
a truly "photometrically normal" SNla as claimed by Stani- 
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shev et al. (2007). Figure 14 shows the fitted parameters for 
2003du, using CfA3 observations. Given the redshift assumed 
in the survey data, the fitted absolute magnitude of 2003du 
is -18.81. With this magnitude, 2003du lies at the center 
of the stretch-magnitude distribution, but somewhat off the 
main color-magnitude relation and has a figure of merit of 
f^i - 0.60 (black circle). If we use the peak magnitude de- 
rived from the larger distance modulus, Mb = -19.3, 2003du 
moves toward the center of the color-magnitude relation and 
receives ff. = 0.79 (red triangle). Either way 2003du is lo- 
cated at the blue end of the bulk distribution, and while it may 
qualify as a "photometrically normal" SNIa, it is not an ideal 
candidate for light curve comparisons. 

In summary, there are good reasons to be careful with vi- 
sual light curve comparisons, most importantly that features 
such as a shallow rising slope can easily be missed. 2003du 
is a particularly unfortunate case where a large uncertainty 
in the distance modulus makes any inference about explosion 
models difficult. 

5. CONCLUSION 

We have presented a rigorous new method to evaluate ex- 
plosion model light curves using observed data. We rely on 
a data-driven model to represent the heterogeneous family of 
SNla light curves. For this purpose, we choose the most re- 
cently trained model, SALT2. For each explosion model, we 
perform multiple fits to the SALT2 model using a wide range 
of reasonable choices for filter bands and epoch range, since 
those choices affect the fit results. For each fit, we derive fit 
results in the SALT2 parameter space of stretch, color and 
magnitude. We take the weighted mean of those parameters 
to be the best estimate of the fit results for a given set of explo- 
sion model light curves. We compare these fitted parameters 
to a parametric population model in stretch-color-magnitude 
space which we construct from the color and stretch distribu- 
tions of K13, and values for a, /3, and Mq from the literature. 
We extend this population model with small outlier popula- 
tions in color and magnitude to account for observed outlier 
events in a nearby data set. The final figure of merit is com- 
posed of two individual figures of merit for the goodness-of-fit 
of a light curve, and for the likelihood of observing an event 
with the fitted stretch, color and magnitude in nature. 

We have used this method to evaluate a variety of explosion 
models, and found their figures of merit to range from 0.185 
to 0.786, given an allowed range of (worst) to 1 (best). We 
found the delayed-detonation models of KRW09 to agree well 
with observed light curves, particularly the isotropic models. 
We confirmed that the W7 model also matches observed light 
curves well. We found a wide range of figures of merit for the 
pure-deflagration models analyzed, clearly distinguishing be- 
tween few-bubble and many-bubble variations of those mod- 
els. Finally, we found that a suite of off-centered detonation 
models does not reproduce the stretch-magnitude and color- 
magnitude relations. 

We discussed uncertainties on our population model, and 
concluded that our model is remarkably stable within the cur- 
rent uncertainties on the SALT2 parameters a and j6. How- 
ever, due to the uncertainty in Hq it is stiU somewhat difficult 
to define the absolute magnitudes of SNela to the desired ac- 
curacy. We conclude that our population model is valid within 
the latest cosmological limits on Hq. Furthermore, we inves- 
tigated some seemingly contradictory inferences from visual 
Ught curve comparisons and our method. We concluded ffiat 
visual light curve comparisons can be misleading, particularly 



if the absolute magnitude of the observed SNla is not well de- 
termined. 

We have left various promising avenues of research for fu- 
ture investigations, and briefly discuss a few of them. First, we 
did not consider the IR regions of the explosion model light 
curves since they are notoriously difficult to model accurately 
in radiative transfer calculations. Our method could easily be 
extended by using a data-driven model which covers the IR, 
such as SNooPy. Using a different data-driven model, how- 
ever, would mean having to develop a new population model 
in the new parameter space. 

Secondly, our analysis neglected the time of explosion 
which is known for explosion models. While the time be- 
tween explosion and peak brightness is unknown for individ- 
ual observed SNela, recent observations have shown the slope 
of the rising light curves to closely follow a f law. This be- 
havior was demonstrated for observations of a single well- 
observed SNla (Nugent et al. 2011), the statistical average 
of a large sample (Hayden et al. 2010; Ganeshalingam et al. 
2011) and through analytical modeling (Piro & Nakar 2012, 
and references therein). The f dependence could be used to 
extrapolate the SALT2 model to very early times, and thus in- 
clude the time of explosion as a data point in the Ught curve 
fit. The constraint of zero flux at explosion would place tight 
constraints on the rising light curves produced by explosion 
models, which have been found to be too shallow in various 
explosion models. 

Third, there are uncertainties in our population model 
which are difficult to address with current data, such as the 
best functional form to parametrize the distributions of stretch 
and color, the tails of these distributions at extreme, rare val- 
ues, as well as the dependence of the population model on 
redshift. Given the larger data sets of the future, we envision 
a global fit of the data to a function with many free parame- 
ters. As with the analysis of K13, such strategies still have to 
rely on an accurate model of survey biases. 

Finally, data-driven models do not parametrize peculiar 
SNela. Thus, our method assigns explosion models with pe- 
culiar Ught curves a relatively low figure of merit, even though 
they might match peculiar SNela observed in nature. For ex- 
plosion models which might reproduce pecuUar SNela such 
as sub-luminous Type lax, we could use the photometric clas- 
sification algorithm of Sako et al. (2011). In this analysis, 
core-collapse SN templates are replaced with templates for 
peculiar SNela, and explosion models are identified as either 
a normal SNla or any of the known pecuUar-Ia. 
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APPENDIX 
A. SURVEY DATA 

Here we briefly review the survey data which were used to check the consistency of our population model in Section 2.4, and 
to extend it to represent outliers in color and magnitude. We describe the SDSS and CfA3 surveys, the selection cuts applied to 
these samples, and our procedure to reconstruct the rest-frame peak magnitude. We note that the population model of K13 itself 
was based on both the SDSS and SNLS (GIO) surveys, and refer the reader to those papers for detailed information on the data 
sets. 
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A. 1 . The SDSS Survey 

The Sloan Digital Sky Survey-H Supernova Survey (SDSS-H, York et al. 2000; Frieman et al. 2008) used the SDSS camera 
and telescope (Gunn et al. 1998, 2006) to search for SNe in three seasons from 2005 to 2007 (Sako et al. 2008). The survey 
scanned a 300 sq. degree region called Stripe 82 with a typical cadence of four nights, obtaining images in the ugriz filter bands 
(Fukugita et al. 1996). The final SNla photometry was derived using the Scene Modeling Photometry technique (Holtzman et al. 
2008), and used in this paper as well as K13. The SDSS survey discovered about 500 spectroscopically confirmed SNela. 

A.2. The CfA3 Survey 

The CfA3 survey (Hicken et al. 2009) was conducted between 2001 and 2008 at the F. L. Whipple Observatory of the Harvard- 
Smithsonian Center for Astrophysics (CfA). The survey recorded a total of 11,500 observations of 185 Type la SNe below 
z = 0.08. Over its lifetime, CfA3 used two cameras, 4Shooter and Keplercam. The Johnson U, B and V filters were used on 
all three cameras, but the IR filters were changed during the survey. Unlike SDSS, the CfA3 survey did not discover SNela 
itself, but followed up discoveries by other groups and amateur astronomers, about half of them from the Lick Observatory 
Supernova Search (LOSS, Filippenko et al. 2001). Hicken et al. (2009) emphasize that the CfA3 sample is not representative of 
the underlying SNIa population since the observing strategy favored yoimg and extreme events. 

A.3. Selection Cuts 

The SDSS and CfA3 samples used in Section 2.4 were fit with SALT2 using snana. In order to ensure reliable light curve fits, 
we appUed the following selection cuts: 

• At least three filters need to have at least one observation each with signal-to-noise ratio > 5. 

• There must be at least one observation before B-band peak, and at least one later than 10 rest-frame days after peak. This 
criterion ensures that xi is somewhat constrained. 

• There must be a total of at least 12 observations in all filters combined. 

• The fit probability (as derived from;^'^/A'(iof) is at least 1%. 

• The SN must be a confirmed Type la event. 

These cuts are deliberately not too strict, since we do not want to exclude interesting populations which might deviate from our 
population model. 

A.4. Reconstruction of rest-frame magnitudes 

A SALT2 fit to observed SNela returns values for m^, the peak magnitude in a redshifted B-band, for x\ = c = 0. For the 
purpose of comparing explosion models to data, however, we are interested in the rest-frame characteristics of an explosion, 
namely the best-fit, rest-frame, B-band peak magnitude, Mg**'. As we generate our mock light cmves of the explosion models at 
a negligible redshift, A^^^*' ^ me - yu. For observed SNela at higher redshift, however, the filter transmission function for B-band 
is redshifted between the rest and observer frames, meaning that K-corrections are necessary to translate between otb and A/^^*'. 
For observed SNela at higher redshift, appropriate K-corrections are used to compute M?*'. 



B. POPULATION MODEL 

In Section 2.4 we stated the probability distribution for our population model. Here, we derive it more rigorously, and compute 
marginalized probability distributions in two variables. We assume that the variables x = {x\,c. Am) are independent and Gaus- 
sian. For now, we ignore their asymmetric cr values and their potentially non-zero means. We can write the probabiUty density 



as a trivariate Gaussian, dP(xi,c, Am) = A^eJ" ^ 
parameter space y = ix\,c, Mb) is given by y 



^dxi dcdAu with C = diag(cr^p , cr^ 
Tx, where T and the new inverse covariance matrix D"" are 
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The probabiUty density is now 

dP(x\ , c, Mb) = A^ exp 



dxi dc dM^ = N exp 
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Inserting the mean values yU^i, and fiM recovers Equation (9). As explained in Section 2.4, one Gaussian probability function 
per parameter does not capture outhers in color and magnitude. Thus, we represent each parameter by a sum of Gaussians with 
independent means, lower and upper variances such that 
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where the weights w, sum to 1. Due to the Hnearity of the system, all the above equations still hold for the sums of Gaussians. The 
remaining challenge is to marginalize the distribution over each one of the variables to obtain probability contours in planes of 
two variables for plotting. For any combination of xi , c and the corrected magnitude, Mb + axi -/3c, their probability distributions 
are linear sums of independent Gaussians (Figure 5). In the cases of the xi - Ms and c-Mb planes, however, we need to integrate 
over the asymmetric Gaussian distributions of both the third variable and Mb (since it is not independent of the third variable). 
For the case of integrating over i Gaussians in xi, their ± asymmetry, and j (symmetric) Gaussians in Mb, we obtain 



dcdMB dc 
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The confidence contours for these probability distributions were computed numerically, and are shown in Figure 4c. 
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